Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3597503.3623348acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence Reasoning

Published: 06 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Causal discovery is a powerful technique for identifying causal relationships among variables in data. It has been widely used in various applications in software engineering. Causal discovery extensively involves conditional independence (CI) tests. Hence, its output quality highly depends on the performance of CI tests, which can often be unreliable in practice. Moreover, privacy concerns arise when excessive CI tests are performed.
    Despite the distinct nature between unreliable and excessive CI tests, this paper identifies a unified and principled approach to addressing both of them. Generally, CI statements, the outputs of CI tests, adhere to Pearl's axioms, which are a set of well-established integrity constraints on conditional independence. Hence, we can either detect erroneous CI statements if they violate Pearl's axioms or prune excessive CI statements if they are logically entailed by Pearl's axioms. Holistically, both problems boil down to reasoning about the consistency of CI statements under Pearl's axioms (referred to as CIR problem).
    We propose a runtime verification tool called CICheck, designed to harden causal discovery algorithms from reliability and privacy perspectives. CICheck employs a sound and decidable encoding scheme that translates CIR into SMT problems. To solve the CIR problem efficiently, CICheck introduces a four-stage decision procedure with three lightweight optimizations that actively prove or refute consistency, and only resort to costly SMT-based reasoning when necessary. Based on the decision procedure to CIR, CICheck includes two variants: ED-Check and P-Check, which detect erroneous CI tests (to enhance reliability) and prune excessive CI tests (to enhance privacy), respectively. We evaluate CICheck on four real-world datasets and 100 CIR instances, showing its effectiveness in detecting erroneous CI tests and reducing excessive CI tests while retaining practical performance.

    References

    [1]
    2023. bnlearn. https://www.bnlearn.com/bnrespository.
    [2]
    2023. causal-learn. https://github.com/py-why/causal-learn.
    [3]
    2023. Source code and data. https://anonymous.4open.science/r/CISan.
    [4]
    Wilhelm Ackermann. 1954. Solvable cases of the decision problem. (1954).
    [5]
    Peter Martey Addo, Christelle Manibialoa, and Florent McIsaac. 2021. Exploring nonlinearity on the CO2 emissions, economic production and energy use nexus: A causal discovery approach. Energy Reports 7 (2021), 6196--6204.
    [6]
    Mona Attariyan and Jason Flinn. 2008. Using Causality to Diagnose Configuration Bugs. In USENIX Annual Technical Conference. 281--286.
    [7]
    Haniel Barbosa, Clark Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, Andres Nötzli, et al. 2022. cvc5: A versatile and industrial-strength SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems: 28th International Conference, TACAS 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2--7, 2022, Proceedings, Part I. Springer, 415--442.
    [8]
    Clark Barrett, Aaron Stump, Cesare Tinelli, et al. 2010. The smt-lib standard: Version 2.0. In Proceedings of the 8th international workshop on satisfiability modulo theories (Edinburgh, UK), Vol. 13. 14.
    [9]
    Remco Bouckaert, Raymond Hemmecke, Silvia Lindner, and Milan Studenỳ. 2010. Efficient Algorithms for Conditional Independence Inference. Journal of Machine Learning Research 11 (2010), 3453--3479.
    [10]
    Remco R Bouckaert and Milan Studenỳ. 2007. Racing algorithms for conditional independence inference. International Journal of Approximate Reasoning 45, 2 (2007), 386--401.
    [11]
    Nicholas Carlini, Antonio Barresi, Mathias Payer, David Wagner, and Thomas R Gross. 2015. {Control-Flow} bending: On the effectiveness of {Control-Flow} integrity. In 24th USENIX Security Symposium (USENIX Security 15). 161--176.
    [12]
    Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-os binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 678--689.
    [13]
    Pengfei Chen, Yong Qi, Pengfei Zheng, and Di Hou. 2014. Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems. In IEEE INFOCOM 2014-IEEE Conference on Computer Communications. IEEE, 1887--1895.
    [14]
    Zhe Chen, Zhemin Wang, Yunlong Zhu, Hongwei Xi, and Zhibin Yang. 2016. Parametric runtime verification of C programs. In Tools and Algorithms for the Construction and Analysis of Systems: 22nd International Conference, TACAS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2--8, 2016, Proceedings 22. Springer, 299--315.
    [15]
    Mauro Conti, Stephen Crane, Lucas Davi, Michael Franz, Per Larsen, Marco Negro, Christopher Liebchen, Mohaned Qunaibit, and Ahmad-Reza Sadeghi. 2015. Losing control: On the effectiveness of control-flow integrity under stack attacks. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 952--963.
    [16]
    Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems: 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings 14. Springer, 337--340.
    [17]
    Rui Ding, Yanzhi Liu, Jingjing Tian, Zhouyu Fu, Shi Han, and Dongmei Zhang. 2020. Reliable and efficient anytime skeleton learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10101--10109.
    [18]
    Clemens Dubslaff, Kallistos Weis, Christel Baier, and Sven Apel. 2022. Causality in configurable software systems. In Proceedings of the 44th International Conference on Software Engineering. 325--337.
    [19]
    Anna Fariha, Suman Nath, and Alexandra Meliou. 2020. Causality-guided adaptive interventional debugging. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 431--446.
    [20]
    Dan Geiger and Judea Pearl. 1993. Logical and algorithmic properties of conditional independence and graphical models. The annals of statistics 21, 4 (1993), 2001--2021.
    [21]
    Chun-Hung Hsiao, Satish Narayanasamy, Essam Muhammad Idris Khan, Cristiano L Pereira, and Gilles A Pokam. 2017. Asyncclock: Scalable inference of asynchronous event causality. ACM SIGPLAN Notices 52, 4 (2017), 193--205.
    [22]
    Antti Hyttinen, Patrik Hoyer, Frederick Ederhardt, and Matti Järvisalo. 2013. Discovering Cyclic Causal Models with Latent Variables: A General SAT-Based Procedure. In Conference on Uncertainty in Artificial Intelligence. AUAI Press, 301--310.
    [23]
    Zhenlan Ji, Pingchuan Ma, and Shuai Wang. 2022. PerfCE: Performance Debugging on Databases with Chaos Engineering-Enhanced Causality Analysis. arXiv preprint arXiv:2207.08369 (2022).
    [24]
    Zhenlan Ji, Pingchuan Ma, Shuai Wang, and Yanhui Li. 2023. Causality-Aided Trade-off Analysis for Machine Learning Fairness. arXiv preprint arXiv:2305.13057 (2023).
    [25]
    Zhenlan Ji, Pingchuan Ma, Yuanyuan Yuan, and Shuai Wang. 2023. CC: Causality-Aware Coverage Criterion for Deep Neural Networks. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1788--1800.
    [26]
    Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2020. Causal testing: understanding defects' root causes. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 87--99.
    [27]
    Daniel Kroening and Ofer Strichman. 2016. Decision Procedures - An Algorithmic Point of View, Second Edition. Springer.
    [28]
    Pingchuan Ma, Rui Ding, Haoyue Dai, Yuanyuan Jiang, Shuai Wang, Shi Han, and Dongmei Zhang. 2022. ML4S: Learning Causal Skeleton from Vicinal Graphs. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1213--1223.
    [29]
    Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, and Dongmei Zhang. 2023. XInsight: eXplainable Data Analysis Through The Lens of Causality. Proceedings of the ACM on Management of Data 1, 2 (2023), 1--27.
    [30]
    Pingchuan Ma, Zhenlan Ji, Qi Pang, and Shuai Wang. 2022. NoLeaks: Differentially Private Causal Discovery Under Functional Causal Model. IEEE Transactions on Information Forensics and Security 17 (2022), 2324--2338.
    [31]
    Pingchuan Ma, Zhenlan Ji, Peisen Yao, Shuai Wang, and Kui Ren. 2023. Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence Reasoning (Extended Version). arXiv preprint arXiv:2309.05264 (2023).
    [32]
    C MEEK. 1995. Strong completeness and faithfulness in Bayesian networks. In Proc. Conf. on Uncertainty in Artificial Intelligence (UAI-95). 411--418.
    [33]
    Mathias Niepert, Marc Gyssens, Bassem Sayrafi, and Dirk Van Gucht. 2013. On the conditional independence implication problem: A lattice-theoretic approach. Artificial Intelligence 202 (2013), 29--51.
    [34]
    Judea Pearl. 1988. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan kaufmann.
    [35]
    Judea Pearl and Azaria Paz. 1986. Graphoids: Graph-Based Logic for Reasoning about Relevance Relations or When would x tell you more about y if you already know z? In Proceedings of the 7th European Conference on Artificial Intelligence (ECAI 1986).
    [36]
    Karl Pearson. 1900. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50, 302 (1900), 157--175.
    [37]
    Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of causal inference: foundations and learning algorithms. The MIT Press.
    [38]
    Andrea Pinna, Nicola Soranzo, and Alberto De La Fuente. 2010. From knockouts to networks: establishing direct cause-effect relationships through graph analysis. PloS one 5, 10 (2010), e12912.
    [39]
    Grigore Roşu, Wolfram Schulte, and Traian Florin Şerbănuţă. 2009. Runtime verification of C memory safety. In Runtime Verification: 9th International Workshop, RV 2009, Grenoble, France, June 26--28, 2009. Selected Papers 9. Springer, 132--151.
    [40]
    Jakob Runge, Sebastian Bathiany, Erik Bollt, Gustau Camps-Valls, Dim Coumou, Ethan Deyle, Clark Glymour, Marlene Kretschmer, Miguel D Mahecha, Jordi Muñoz-Marí, et al. 2019. Inferring causation from time series in Earth system sciences. Nature communications 10, 1 (2019), 1--13.
    [41]
    Julien Siebert. 2023. Applications of statistical causal inference in software engineering. Information and Software Technology (2023), 107198.
    [42]
    Peter Spirtes, Clark N Glymour, Richard Scheines, and David Heckerman. 2000. Causation, prediction, and search. MIT press.
    [43]
    Richard P Stanley. 1973. Acyclic orientations of graphs. Discrete Mathematics 5, 2 (1973), 171--178.
    [44]
    Milan Studeny. 1992. Conditional independence relations have no finite complete characterization. In Transactions of the 11th Prague Conference In Information Theory, Statistical Decision Functions and Random Processes. 377--396.
    [45]
    Bing Sun, Jun Sun, Long H Pham, and Jie Shi. 2022. Causality-based neural network repair. In Proceedings of the 44th International Conference on Software Engineering. 338--349.
    [46]
    Kentaro Tanaka, Milan Studeny, Akimichi Takemura, and Tomonari Sei. 2015. A linear-algebraic tool for conditional independence inference. Journal of Algebraic Statistics 6, 2 (2015), 150--167.
    [47]
    Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning 65, 1 (2006), 31--78.
    [48]
    Lun Wang, Qi Pang, and Dawn Song. 2020. Towards practical differentially private causal graph discovery. Advances in Neural Information Processing Systems 33 (2020), 5516--5526.
    [49]
    Zhaoyu Wang, Pingchuan Ma, and Shuai Wang. 2023. Towards Practical Federated Causal Structure Learning. arXiv preprint arXiv:2306.09433 (2023).
    [50]
    Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. Validating SMT solvers via semantic fusion. In Proceedings of the 41st ACM SIGPLAN Conference on programming language design and implementation. 718--730.
    [51]
    Christoph M Wintersteiger, Youssef Hamadi, and Leonardo De Moura. 2013. Efficiently solving quantified bit-vector formulas. Formal Methods in System Design 42, 1 (2013), 3--23.
    [52]
    Depeng Xu, Shuhan Yuan, and Xintao Wu. 2017. Differential privacy preserving causal graph discovery. In 2017 IEEE Symposium on Privacy-Aware Computing (PAC). IEEE, 60--71.
    [53]
    Tianyin Xu and Yuanyuan Zhou. 2015. Systems approaches to tackling configuration errors: A survey. ACM Computing Surveys (CSUR) 47, 4 (2015), 1--41.
    [54]
    Dong Young Yoon, Ning Niu, and Barzan Mozafari. 2016. Dbsherlock: A performance diagnostic tool for transactional databases. In Proceedings of the 2016 International Conference on Management of Data. 1599--1614.
    [55]
    Zhalama Zhalama, Jiji Zhang, Frederick Eberhardt, Wolfgang Mayer, and Mark Junjie Li. 2019. ASP-based discovery of semi-Markovian causal models under weaker assumptions. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 1488--1494.
    [56]
    Jiji Zhang. 2008. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence 172, 16--17 (2008), 1873--1896.
    [57]
    Mengdi Zhang and Jun Sun. 2022. Adaptive fairness improvement based on causality analysis. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 6--17.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
    May 2024
    2942 pages
    ISBN:9798400702174
    DOI:10.1145/3597503
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • Faculty of Engineering of University of Porto

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 February 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. causal discovery
    2. conditional independence
    3. SMT

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICSE '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 106
      Total Downloads
    • Downloads (Last 12 months)106
    • Downloads (Last 6 weeks)11
    Reflects downloads up to

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media