research-article

Testing Causality in Scientific Modelling Software

Authors:

Andrew G. Clark,

Michael Foster,

Benedikt Prifling,

Neil Walkinshaw,

Robert M. Hierons,

Volker Schmidt,

Robert D. TurnerAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 1

Article No.: 10, Pages 1 - 42

https://doi.org/10.1145/3607184

Published: 24 November 2023 Publication History

Abstract

From simulating galaxy formation to viral transmission in a pandemic, scientific models play a pivotal role in developing scientific theories and supporting government policy decisions that affect us all. Given these critical applications, a poor modelling assumption or bug could have far-reaching consequences. However, scientific models possess several properties that make them notoriously difficult to test, including a complex input space, long execution times, and non-determinism, rendering existing testing techniques impractical. In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal inference has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse data instead of costly experiments. This article introduces the causal testing framework: a framework that uses causal inference techniques to establish causal effects from existing data, enabling users to conduct software testing activities concerning the effect of a change, such as metamorphic testing, a posteriori. We present three case studies covering real-world scientific models, demonstrating how the causal testing framework can infer metamorphic test outcomes from reused, confounded test data to provide an efficient solution for testing scientific modelling software.

References

[1]

Jason Abrevaya, Yu-Chin Hsu, and Robert P. Lieli. 2015. Estimating conditional average treatment effects. Journal of Business & Economic Statistics 33, 4 (2015), 485–505.

[2]

Clement Adebamowo, Oumou Bah-Sow, Fred Binka, Roberto Bruzzone, Arthur Caplan, Jean-François Delfraissy, David Heymann, et al. 2014. Randomised controlled trials for Ebola: Practical and ethical issues. Lancet 384, 9952 (2014), 1423–1424.

[3]

Aitor Arrieta, Jon Ayerdi, Miren Illarramendi, Aitor Agirre, Goiuria Sagardui, and Maite Arratibel. 2021. Using machine learning to build test oracles: An industrial case study on elevators dispatching algorithms. In Proceedings of the 2021 IEEE/ACM International Conference on Automation of Software Test (AST’21). IEEE, Los Alamitos, CA, 30–39.

[4]

Susan Athey and Stefan Wager. 2019. Estimating treatment effects with causal forests: An application. Observational Studies 5, 2 (2019), 37–51.

[5]

Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2021. Generating metamorphic relations for cyber-physical systems with genetic programming: An industrial case study. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’21). ACM, New York, NY, 1264–1274. DOI:

Digital Library

[6]

Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2022. Evolutionary generation of metamorphic relations for cyber-physical systems. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO’22). ACM, New York, NY, 15–16. DOI:

Digital Library

[7]

George K. Baah, Andy Podgurski, and Mary Jean Harrold. 2010. Causal inference for statistical fault localization. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA’10). ACM, New York, NY, 73–84. DOI:

Digital Library

[8]

George K. Baah, Andy Podgurski, and Mary Jean Harrold. 2011. Mitigating the confounding effects of program dependences for effective fault localization. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). ACM, New York, NY, 146–156. DOI:

Digital Library

[9]

Zhuofu Bai, Gang Shu, and Andy Podgurski. 2015. NUMFL: Localizing faults in numerical software using a value-based causal model. In Proceedings of the 2015 IEEE 8th International Conference on Software Testing, Verification, and Validation (ICST’15). IEEE, Los Alamitos, CA, 1–10. DOI:

[10]

Elias Bareinboim and Judea Pearl. 2016. Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences of the United States of America 113, 27 (2016), 7345–7352. https://www.jstor.org/stable/26470690

[11]

Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The oracle problem in software testing: A survey. IEEE Transactions on Software Engineering 41, 5 (2015), 507–525. DOI:

Digital Library

[12]

Kenneth Benoit. 2011. Linear regression models with logarithmic transformations. London School of Economics, London 22, 1 (2011), 23–36.

[13]

Lionel C. Briand, Yvan Labiche, Zaheer Bawar, and Nadia Traldi Spido. 2009. Using machine learning to refine category-partition test specifications and test suites. Information and Software Technology 51, 11 (2009), 1551–1564.

Digital Library

[14]

Manfred Broy, Bengt Jonsson, Joost-Pieter Katoen, Martin Leucker, and Alexander Pretschner (Eds.). 2005. Model-Based Testing of Reactive Systems: Advanced Lectures. Lecture Notes in Computer Science, Vol. 3472. Springer.

[15]

Nadia Burkart and Marco F. Huber. 2021. A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research 70 (2021), 245–317.

Digital Library

[16]

Michael J. Butler, Philipp Körner, Sebastian Krings, Thierry Lecomte, Michael Leuschel, Luis-Fernando Mejia, and Laurent Voisin. 2020. The first twenty-five years of industrial use of the B-Method. In Formal Methods for Industrial Critical Systems. Lecture Notes in Computer Science, Vol. 12327. Springer, 189–209.

[17]

Nancy Cartwright and Eileen Munro. 2010. The limitations of randomized controlled trials in predicting effectiveness. Journal of Evaluation in Clinical Practice 16 2 (2010), 260–266.

[18]

cellML. 2022. cellML: Luo-Rudy 1991. Retrieved July 14, 2023 from https://models.cellml.org/exposure/456b07d6a7a5b45ed71caad0ea2c0b9d

[19]

Eugene T. Y. Chang, Mark Strong, and Richard H. Clayton. 2015. Bayesian sensitivity analysis of a cardiac cell model using a Gaussian process emulator. PLoS One 10, 6 (2015), e0130252.

[20]

Tsong Y. Chen, Shing C. Cheung, and Shiu Ming Yiu. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report HKUST-CS98-01. The Hong Kong University of Science and Technology.

[21]

Vishnu Vardhan Chetlur and Harpreet S. Dhillon. 2018. Coverage analysis of a vehicular network modeled as Cox process driven by Poisson line process. IEEE Transactions on Wireless Communications 17, 7 (2018), 4401–4416. DOI:

Digital Library

[22]

Sung Nok Chiu, Dietrich Stoyan, W. S. Kendall, and Joseph Mecke. 2013. Stochastic Geometry and Its Applications (3rd ed.). John Wiley & Sons, Chichester, West Sussex, United Kingdom.

[23]

Hana Chockler, Daniel Kroening, and Youcheng Sun. 2021. Explanations for occluded images. arXiv:2103.03622 (2021). DOI:

[24]

Tsun S. Chow. 1978. Testing software design modeled by finite-state machines. IEEE Transactions on Software Engineering 3 (1978), 178–187.

Digital Library

[25]

Carlos Cinelli and Chad Hazlett. 2020. Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society Series B: Statistical Methodology 82, 1 (2020), 39–67.

[26]

Jamie A. Cohen, Dina Mistry, Cliff C. Kerr, and Daniel J. Klein. 2020. Schools Are Not Islands: Balancing COVID-19 Risk and Educational Benefits Using Structural and Temporal Countermeasures. Retrieved July 14, 2023 from DOI:

[27]

Stefano Conti and Anthony O’Hagan. 2010. Bayesian emulation of complex multi-output and dynamic computer models. Journal of Statistical Planning and Inference 140, 3 (2010), 640–651.

[28]

Jerome Cornfield, William Haenszel, E. Cuyler Hammond, Abraham M. Lilienfeld, Michael B. Shimkin, and Ernst L. Wynder. 1959. Smoking and lung cancer: Recent evidence and a discussion of some questions. JNCI: Journal of the National Cancer Institute 22, 1 (1959), 173–203. DOI:

[29]

Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer, Berlin, Germany, 337–340.

[30]

Jared L. Deutsch and Clayton V. Deutsch. 2012. Latin hypercube sampling with multidimensional uniformity. Journal of Statistical Planning and Inference 142, 3 (2012), 763–772. DOI:

[31]

J. Dick and A. Faivre. 1993. Automating the generation and sequencing of test cases from model-based specifications. In FME’93: Industrial-Strength Formal Methods. Lecture Notes in Computer Science, Vol. 670, 268–284.

[32]

John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C. North, and Gordon Woodhull. 2002. Graphviz—Open source graph drawing tools. In Graph Drawing, Petra Mutzel, Michael Jünger, and Sebastian Leipert (Eds.). Springer, Berlin, Germany, 483–484.

[33]

Ismail Farajpour and Sez Atamturktur. 2013. Error and uncertainty analysis of inexact and imprecise computer models. Journal of Computing in Civil Engineering 27, 4 (2013), 407–418.

[34]

Institute for Disease Modeling. 2022. Covasim: Vaccine Tests. Retrieved July 14, 2023 from https://github.com/InstituteforDiseaseModeling/covasim/blob/master/tests/test_interventions.py

[35]

Institute for Disease Modelling. 2022. Covasim. Retrieved July 14, 2023 from https://github.com/InstituteforDiseaseModeling/covasim

[36]

Marie-Claude Gaudel. 1995. Testing can be formal, too. In TAPSOFT’95: Theory and Practice of Software Development. Lecture Notes in Computer Science, Vol. 915. Springer, 82–96.

[37]

Luca Giamattei, Roberto Pietrantuono, and Stefano Russo. 2023. Reasoning-based software testing. arXiv:2303.01302 (2023). DOI:

[38]

Clark Glymour, Kun Zhang, and Peter Spirtes. 2019. Review of causal discovery methods based on graphical models. Frontiers in Genetics 10 (2019), 524.

[39]

Ross Gore and Paul F. Reynolds. 2012. Reducing confounding bias in predicate-level statistical debugging metrics. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE’12). IEEE, Los Alamitos, CA, 463–473. DOI:

[40]

S. Greenland, J. Pearl, and J. M. Robins. 1999. Causal diagrams for epidemiologic research. Epidemiology 10, 1 (Jan. 1999), 37–48.

[41]

Michael H. Grider, Rishita Jessu, and Rian Kabir. 2019. Physiology, Action Potential. StatPearls [Internet]. Treasure Island, FL.

[42]

Ralph Guderlei and Johannes Mayer. 2007. Statistical metamorphic testing testing programs with random output by means of statistical hypothesis tests and metamorphic testing. In Proceedings of the 7th International Conference on Quality Software (QSIC’07). IEEE, Los Alamitos, CA, 404–409. DOI:

[43]

Joseph Y. Halpern and Judea Pearl. 2005. Causes and explanations: A structural-model approach. Part I: Causes. British Journal for the Philosophy of Science 56, 4 (2005), 843–887.

[44]

Miguel A. Hernán and James M. Robins. 2020. Causal Inference: What if. Chapman & Hall/CRC, Boca Raton, FL.

[45]

R. M. Hierons. 1997. Testing from a Z specification. Journal of Software Testing, Verification and Reliability 7, 1 (1997), 19–33.

[46]

Robert M. Hierons, Kirill Bogdanov, Jonathan P. Bowen, Rance Cleaveland, John Derrick, Jeremy Dick, Marian Gheorghe, et al. 2009. Using formal specifications to support testing. ACM Computing Surveys 41, 2 (2009), Article 9, 76 pages.

Digital Library

[47]

Paul W. Holland. 1986. Statistics and causal inference. Journal of the American Statistical Association 81, 396 (1986), 945–960.

[48]

Andreas Holzinger, Georg Langs, Helmut Denk, Kurt Zatloukal, and Heimo Müller. 2019. Causability and explainability of artificial intelligence in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9, 4 (2019), e1312.

[49]

Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2020. Causal testing: Understanding defects’ root causes. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. IEEE, Los Alamitos, CA, 87–99.

Digital Library

[50]

Upulee Kanewala and James M. Bieman. 2013. Using machine learning techniques to detect metamorphic relations for programs without test oracles. In Proceedings of the 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE’13). IEEE, Los Alamitos, CA, 1–10.

[51]

Upulee Kanewala and James M. Bieman. 2014. Testing scientific software: A systematic literature review. Information and Software Technology 56, 10 (2014), 1219–1232. DOI:

Digital Library

[52]

Luke Keele. 2015. The statistics of causal inference: A view from political methodology. Political Analysis 23, 3 (2015), 313–335. DOI:

[53]

Diane Kelly and Rebecca Sanders. 2008. The challenge of testing scientific software. In Proceedings of the 3rd Annual Conference of the Association for Software Testing (CAST’08). 30–36.

[54]

John Kendall. 2003. Designing a research project: Randomised controlled trials and their principles. Emergency Medicine Journal: EMJ 20, 2 (2003), 164.

[55]

Cliff C. Kerr, Dina Mistry, Robyn M. Stuart, Katherine Rosenfeld, Gregory R. Hart, Rafael C. Núñez, Jamie A. Cohen, et al. 2021. Controlling COVID-19 via test-trace-quarantine. Nature Communications 12, 1 (2021), 1–12.

[56]

Cliff C. Kerr, Robyn M. Stuart, Dina Mistry, Romesh G. Abeysuriya, Katherine Rosenfeld, Gregory R. Hart, Rafael C. Núñez, et al. 2021. Covasim: An agent-based model of COVID-19 dynamics and interventions. PLOS Computational Biology 17, 7 (2021), e1009149.

[57]

Jack P. C. Kleijnen. 1995. Verification and validation of simulation models. European Journal of Operational Research 82, 1 (1995), 145–162.

[58]

Rex B. Kline. 2015. Principles and Practice of Structural Equation Modeling. Guilford Publications.

[59]

Konstantin Kreyman, David Lorge Parnas, and Sanzheng Qiao. 1999. Inspection Procedures for Critical Programs That Model Physical Phenomena. Technical Report. McMaster University, Hamilton, Canada.

[60]

David Lee and Mihalis Yannakakis. 1996. Principles and methods of testing finite-state machines—A survey. Proceedings of the IEEE 84, 8 (1996), 1089–1123.

[61]

Seongmin Lee, Dave Binkley, Robert Feldt, Nicolas Gold, and Shin Yoo. 2021. Causal program dependence analysis. arXiv:2104.09107 (2021).

[62]

Ching-Hsing Luo and Yoram Rudy. 1991. A model of the ventricular cardiac action potential. Depolarization, repolarization, and their interaction. Circulation Research 68, 6 (1991), 1501–1526.

[63]

Daniel Malinsky and David Danks. 2018. Causal discovery algorithms: A practical guide. Philosophy Compass 13, 1 (2018), e12470.

[64]

Lawrence C. Marsh and David R. Cormier. 2001. Spline Regression Models. No. 137. Sage.

[65]

E. F. Moore. 1956. Gedanken-experiments. In Automata Studies, C. Shannon and J. McCarthy (Eds.). Princeton University Press, Princeton, NJ.

[66]

Frédéric Morlot. 2012. A population model based on a Poisson line tessellation. In Proceedings of the 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt’12). IEEE, Los Alamitos, CA, 337–342.

[67]

Sahil Moza. 2020. sahilm89/lhsmdu: Latin Hypercube Sampling with Multi-Dimensional Uniformity (LHSMDU): Speed Boost Minor Compatibility Fixes. Retrieved July 14, 2023 from

[68]

Gail C. Murphy, David Notkin, and Kevin Sullivan. 1995. Software reflexion models: Bridging the gap between source and high-level models. In Proceedings of the 3rd ACM SIGSOFT Symposium on Foundations of Software Engineering. IEEE, Los Alamitos, CA, 18–28.

Digital Library

[69]

Glenford J. Myers, Tom Badgett, Todd M. Thomas, and Corey Sandler. 2004. The Art of Software Testing. Vol. 2. Wiley.

Digital Library

[70]

Josh W. Nevin, F. J. Vaquero-Caballero, David J. Ives, and Seb J. Savory. 2021. Physics-informed Gaussian process regression for optical fiber communication systems. Journal of Lightwave Technology 39, 21 (2021), 6833–6844.

[71]

Srinivas Nidhra and Jagruthi Dondeti. 2012. Black box and white box testing techniques—A literature review. International Journal of Embedded Systems and Applications 2, 2 (2012), 29–50.

[72]

Khenaidoo Nursimulu and Robert L. Probert. 1995. Cause-effect graphing analysis and validation of requirements. In Proceedings of the 1995 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON’95). 46.

[73]

Jeremy E. Oakley and Anthony O’Hagan. 2004. Probabilistic sensitivity analysis of complex models: A Bayesian approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66, 3 (2004), 751–769.

[74]

Sheila F. O’Brien and Qi Long Yi. 2016. How do I interpret a confidence interval? Transfusion 56, 7 (2016), 1680–1683.

[75]

Marie Oldfield and Ella Haig. 2021. Analytical modelling and UK Government policy. AI and Ethics 2, 3 (July 2021), 389–404. DOI:

[76]

Jasmina Panovska-Griffiths, Cliff C. Kerr, Robyn M. Stuart, Dina Mistry, Daniel J. Klein, Russell M. Viner, and Chris Bonell. 2020. Determining the optimal strategy for reopening schools, the impact of test and trace interventions, and the risk of occurrence of a second COVID-19 epidemic wave in the UK: A modelling study. Lancet Child & Adolescent Health 4, 11 (2020), 817–827.

[77]

Jasmina Panovska-Griffiths, Cliff C. Kerr, William Waites, Robyn Margaret Stuart, Dina Mistry, Derek Foster, Daniel J. Klein, Russell M. Viner, and Chris Bonell. 2021. The potential contribution of face coverings to the control of SARS-CoV-2 transmission in schools and broader society in the UK: A modelling study. Scientific Reports 11 (2021), 8747.

[78]

Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669–688. DOI:

[79]

Judea Pearl. 2009. Causal inference in statistics: An overview. Statistics Surveys 3 (2009), 96–146. DOI:

[80]

Judea Pearl. 2009. Causality. Cambridge Universith Press, Cambridge, United Kingdom.

[81]

Judea Pearl. 2018. Does obesity shorten life? Or is it the soda? On non-manipulable causes. Journal of Causal Inference 6, 2 (2018), 20182001. DOI:

[82]

Judea Pearl and Dana Mackenzie. 2018. The Book of Why. Allen Lane.

Digital Library

[83]

Judea Pearl and Thomas S. Verma. 1995. A theory of inferred causation. In Studies in Logic and the Foundations of Mathematics. Vol. 134. Elsevier, 789–811.

[84]

Andy Podgurski and Yiğit Küçük. 2020. CounterFault: Value-based fault localization by modeling and predicting counterfactual outcomes. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME’20). IEEE, Los Alamitos, CA, 382–393.

[85]

Karishma Rahman and Upulee Kanewala. 2018. Predicting metamorphic relations for matrix calculation programs. In Proceedings of the 2018 IEEE/ACM 3rd International Workshop on Metamorphic Testing (MET’18). IEEE, Los Alamitos, CA, 10–13.

Digital Library

[86]

Paul Ralph. 2021. ACM SIGSOFT empirical standards released. ACM SIGSOFT Software Engineering Notes 46, 1 (Feb. 2021), 19. DOI:

Digital Library

[87]

Carl Edward Rasmussen. 2004. Gaussian processes in machine learning. In Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2–14, 2003, Tübingen, Germany, August 4–16, 2003, Revised Lectures, Olivier Bousquet, Ulrike von Luxburg, and Gunnar Rätsch (Eds.). Springer, Berlin, Germany, 63–71. DOI:

[88]

Carl Edward Rasmussen and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Vol. 1. Springer.

[89]

Kenneth J. Rothman and Sander Greenland. 2005. Causation and causal inference in epidemiology. American Journal of Public Health 95, S1 (2005), 144–150.

[90]

Donald B. Rubin. 2005. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association 100, 469 (2005), 322–331.

[91]

Fanny Sarrazin, Francesca Pianosi, and Thorsten Wagener. 2016. Global sensitivity analysis of environmental models: Convergence and validation. Environmental Modelling & Software 79 (2016), 135–152.

Digital Library

[92]

Nick Scott, Anna Palmer, Dominic Delport, Romesh Abeysuriya, Robyn Stuart, Cliff C. Kerr, Dina Mistry, et al. 2020. Modelling the impact of reducing control measures on the COVID-19 pandemic in a low transmission setting. Medical Journal of Australia 214, 2 (2020), 79–83.

[93]

Sergio Segura, Gordon Fraser, Ana B. Sanchez, and Antonio Ruiz-Cortés. 2016. A survey on metamorphic testing. IEEE Transactions on Software Engineering 42, 9 (2016), 805–824.

[94]

Dongeek Shin, Ahmed Kirmani, Andrea Colaço, and Vivek K. Goyal. 2013. Parametric Poisson process imaging. In Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing. IEEE, Los Alamitos, CA, 1053–1056.

[95]

Gang Shu, Boya Sun, Andy Podgurski, and Feng Cao. 2013. MFL: Method-level fault localization with causal inference. In Proceedings of the 2013 IEEE 6th International Conference on Software Testing, Verification, and Validation. IEEE, Los Alamitos, CA, 124–133.

Digital Library

[96]

J. M. Spivey. 1992. The Z Notation: A Reference Manual (2nd ed.). Prentice Hall.

Digital Library

[97]

Matt Staats, Michael W. Whalen, and Mats P. E. Heimdahl. 2011. Programs, tests, and oracles: The foundations of testing revisited. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE’11). IEEE, Los Alamitos, CA, 391–400.

Digital Library

[98]

Michael Stein. 1987. Large sample properties of simulations using Latin hypercube sampling. Technometrics 29, 2 (1987), 143–151.

[99]

James H. Stock and Mark W. Watson. 2003. Introduction to Econometrics. Vol. 104. Addison Wesley, Boston, MA.

[100]

Robyn M. Stuart, Romesh G. Abeysuriya, Cliff C. Kerr, Dina Mistry, Daniel J. Klein, Richard Gray, Margaret Hellard, and Nick Scott. 2020. The Role of Masks in Reducing the Risk of New Waves of COVID-19 in Low Transmission Settings: A Modeling Study. Retrieved July 14, 2023 from

[101]

Peter W. G. Tennant, Eleanor J. Murray, Kellyn F. Arnold, Laurie Berrie, Matthew P. Fox, Sarah C. Gadd, Wendy J. Harrison, et al. 2021. Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: Review and recommendations. International Journal of Epidemiology 50, 2 (2021), 620–632.

[102]

Robin N. Thompson. 2020. Epidemiological models are important tools for guiding COVID-19 interventions. BMC Medicine 18, 1 (2020), 1–4.

[103]

Jan Tretmans. 2008. Model based testing with labelled transition systems. In Formal Methods and Testing. Lecture Notes in Computer Science, Vol. 4949. Springer, 1–38.

[104]

Mark Utting and Bruno Legeard. 2010. Practical Model-Based Testing: A Tools Approach. Elsevier.

[105]

Mark Utting, Alexander Pretschner, and Bruno Legeard. 2012. A taxonomy of model-based testing approaches. Software Testing, Verification and Reliability 22, 5 (2012), 297–312.

Digital Library

[106]

Tyler J. VanderWeele and Peng Ding. 2017. Sensitivity analysis in observational research: Introducing the E-value. Annals of Internal Medicine 167, 4 (2017), 268–274.

[107]

Ian Vernon, Michael Goldstein, and Richard Bower. 2014. Galaxy formation: Bayesian history matching for the observable universe. Statistical Science 29, 1 (2014), 81–90. DOI:

[108]

Stefan Wager and Susan Athey. 2018. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association 113, 523 (2018), 1228–1242. DOI:

[109]

Neil Walkinshaw, Ramsay Taylor, and John Derrick. 2016. Inferring extended finite state machine models from software executions. Empirical Software Engineering 21 (2016), 811–853.

Digital Library

[110]

Elaine Weyuker. 1982. On testing non-testable programs. Computer Journal 25 (1982), 465–470. DOI:

[111]

Elaine J. Weyuker. 1983. Assessing test data adequacy through program inference. ACM Transactions on Programming Languages and Systems 5, 4 (1983), 641–655.

Digital Library

[112]

Christof Wolf and Henning Best. 2013. The SAGE Handbook of Regression Analysis and Causal Inference. SAGE.

[113]

W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.

Digital Library

Cited By

Giamattei LGuerriero APietrantuono RRusso S(2025)Causal reasoning in Software Quality Assurance: A systematic reviewInformation and Software Technology10.1016/j.infsof.2024.107599178(107599)Online publication date: Feb-2025
https://doi.org/10.1016/j.infsof.2024.107599
Paleyes ALi HLawrence NBosch JLewis GCleland-Huang JMuccini H(2024)Can causality accelerate experimentation in software systems?Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644985(280-281)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3644815.3644985
Foster MWild CHierons RWalkinshaw N(2024)Causal Test Adequacy2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00023(161-172)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00023
Show More Cited By

Index Terms

Testing Causality in Scientific Modelling Software
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Causal reasoning and diagnostics
  2. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Causal testing: understanding defects' root causes
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Understanding the root cause of a defect is critical to isolating and repairing buggy behavior. We present Causal Testing, a new method of root-cause analysis that relies on the theory of counterfactual causality to identify a set of executions that ...
Testing scientific software: A systematic literature review

Context: Scientific software plays an important role in critical decision making, for example making weather predictions based on climate models, and computation of evidence for research publications. Recently, scientists have had to retract ...
Disentangling causality: assumptions in causal discovery and inference
Abstract
Causality has been a burgeoning field of research leading to the point where the literature abounds with different components addressing distinct parts of causality. For researchers, it has been increasingly difficult to discern the assumptions ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 1

January 2024

933 pages

EISSN:1557-7392

DOI:10.1145/3613536

Editor:
Mauro Pezzè
USI Universitá della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2023

Online AM: 12 July 2023

Accepted: 20 June 2023

Revised: 14 June 2023

Received: 01 September 2022

Published in TOSEM Volume 33, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

EPSRC CITCoM

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
544
Total Downloads

Downloads (Last 12 months)359
Downloads (Last 6 weeks)52

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Giamattei LGuerriero APietrantuono RRusso S(2025)Causal reasoning in Software Quality Assurance: A systematic reviewInformation and Software Technology10.1016/j.infsof.2024.107599178(107599)Online publication date: Feb-2025
https://doi.org/10.1016/j.infsof.2024.107599
Paleyes ALi HLawrence NBosch JLewis GCleland-Huang JMuccini H(2024)Can causality accelerate experimentation in software systems?Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644985(280-281)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3644815.3644985
Foster MWild CHierons RWalkinshaw N(2024)Causal Test Adequacy2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00023(161-172)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00023
Maier RSchlattl AGuess TMottok J(2024)CausalOps — Towards an industrial lifecycle for causal probabilistic graphical modelsInformation and Software Technology10.1016/j.infsof.2024.107520174:COnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.infsof.2024.107520

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents