Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Testing Causality in Scientific Modelling Software

Published: 24 November 2023 Publication History

Abstract

From simulating galaxy formation to viral transmission in a pandemic, scientific models play a pivotal role in developing scientific theories and supporting government policy decisions that affect us all. Given these critical applications, a poor modelling assumption or bug could have far-reaching consequences. However, scientific models possess several properties that make them notoriously difficult to test, including a complex input space, long execution times, and non-determinism, rendering existing testing techniques impractical. In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal inference has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse data instead of costly experiments. This article introduces the causal testing framework: a framework that uses causal inference techniques to establish causal effects from existing data, enabling users to conduct software testing activities concerning the effect of a change, such as metamorphic testing, a posteriori. We present three case studies covering real-world scientific models, demonstrating how the causal testing framework can infer metamorphic test outcomes from reused, confounded test data to provide an efficient solution for testing scientific modelling software.

References

[1]
Jason Abrevaya, Yu-Chin Hsu, and Robert P. Lieli. 2015. Estimating conditional average treatment effects. Journal of Business & Economic Statistics 33, 4 (2015), 485–505.
[2]
Clement Adebamowo, Oumou Bah-Sow, Fred Binka, Roberto Bruzzone, Arthur Caplan, Jean-François Delfraissy, David Heymann, et al. 2014. Randomised controlled trials for Ebola: Practical and ethical issues. Lancet 384, 9952 (2014), 1423–1424.
[3]
Aitor Arrieta, Jon Ayerdi, Miren Illarramendi, Aitor Agirre, Goiuria Sagardui, and Maite Arratibel. 2021. Using machine learning to build test oracles: An industrial case study on elevators dispatching algorithms. In Proceedings of the 2021 IEEE/ACM International Conference on Automation of Software Test (AST’21). IEEE, Los Alamitos, CA, 30–39.
[4]
Susan Athey and Stefan Wager. 2019. Estimating treatment effects with causal forests: An application. Observational Studies 5, 2 (2019), 37–51.
[5]
Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2021. Generating metamorphic relations for cyber-physical systems with genetic programming: An industrial case study. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’21). ACM, New York, NY, 1264–1274. DOI:
[6]
Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2022. Evolutionary generation of metamorphic relations for cyber-physical systems. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO’22). ACM, New York, NY, 15–16. DOI:
[7]
George K. Baah, Andy Podgurski, and Mary Jean Harrold. 2010. Causal inference for statistical fault localization. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA’10). ACM, New York, NY, 73–84. DOI:
[8]
George K. Baah, Andy Podgurski, and Mary Jean Harrold. 2011. Mitigating the confounding effects of program dependences for effective fault localization. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). ACM, New York, NY, 146–156. DOI:
[9]
Zhuofu Bai, Gang Shu, and Andy Podgurski. 2015. NUMFL: Localizing faults in numerical software using a value-based causal model. In Proceedings of the 2015 IEEE 8th International Conference on Software Testing, Verification, and Validation (ICST’15). IEEE, Los Alamitos, CA, 1–10. DOI:
[10]
Elias Bareinboim and Judea Pearl. 2016. Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences of the United States of America 113, 27 (2016), 7345–7352. https://www.jstor.org/stable/26470690
[11]
Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The oracle problem in software testing: A survey. IEEE Transactions on Software Engineering 41, 5 (2015), 507–525. DOI:
[12]
Kenneth Benoit. 2011. Linear regression models with logarithmic transformations. London School of Economics, London 22, 1 (2011), 23–36.
[13]
Lionel C. Briand, Yvan Labiche, Zaheer Bawar, and Nadia Traldi Spido. 2009. Using machine learning to refine category-partition test specifications and test suites. Information and Software Technology 51, 11 (2009), 1551–1564.
[14]
Manfred Broy, Bengt Jonsson, Joost-Pieter Katoen, Martin Leucker, and Alexander Pretschner (Eds.). 2005. Model-Based Testing of Reactive Systems: Advanced Lectures. Lecture Notes in Computer Science, Vol. 3472. Springer.
[15]
Nadia Burkart and Marco F. Huber. 2021. A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research 70 (2021), 245–317.
[16]
Michael J. Butler, Philipp Körner, Sebastian Krings, Thierry Lecomte, Michael Leuschel, Luis-Fernando Mejia, and Laurent Voisin. 2020. The first twenty-five years of industrial use of the B-Method. In Formal Methods for Industrial Critical Systems. Lecture Notes in Computer Science, Vol. 12327. Springer, 189–209.
[17]
Nancy Cartwright and Eileen Munro. 2010. The limitations of randomized controlled trials in predicting effectiveness. Journal of Evaluation in Clinical Practice 16 2 (2010), 260–266.
[18]
cellML. 2022. cellML: Luo-Rudy 1991. Retrieved July 14, 2023 from https://models.cellml.org/exposure/456b07d6a7a5b45ed71caad0ea2c0b9d
[19]
Eugene T. Y. Chang, Mark Strong, and Richard H. Clayton. 2015. Bayesian sensitivity analysis of a cardiac cell model using a Gaussian process emulator. PLoS One 10, 6 (2015), e0130252.
[20]
Tsong Y. Chen, Shing C. Cheung, and Shiu Ming Yiu. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report HKUST-CS98-01. The Hong Kong University of Science and Technology.
[21]
Vishnu Vardhan Chetlur and Harpreet S. Dhillon. 2018. Coverage analysis of a vehicular network modeled as Cox process driven by Poisson line process. IEEE Transactions on Wireless Communications 17, 7 (2018), 4401–4416. DOI:
[22]
Sung Nok Chiu, Dietrich Stoyan, W. S. Kendall, and Joseph Mecke. 2013. Stochastic Geometry and Its Applications (3rd ed.). John Wiley & Sons, Chichester, West Sussex, United Kingdom.
[23]
Hana Chockler, Daniel Kroening, and Youcheng Sun. 2021. Explanations for occluded images. arXiv:2103.03622 (2021). DOI:
[24]
Tsun S. Chow. 1978. Testing software design modeled by finite-state machines. IEEE Transactions on Software Engineering 3 (1978), 178–187.
[25]
Carlos Cinelli and Chad Hazlett. 2020. Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society Series B: Statistical Methodology 82, 1 (2020), 39–67.
[26]
Jamie A. Cohen, Dina Mistry, Cliff C. Kerr, and Daniel J. Klein. 2020. Schools Are Not Islands: Balancing COVID-19 Risk and Educational Benefits Using Structural and Temporal Countermeasures. Retrieved July 14, 2023 from DOI:
[27]
Stefano Conti and Anthony O’Hagan. 2010. Bayesian emulation of complex multi-output and dynamic computer models. Journal of Statistical Planning and Inference 140, 3 (2010), 640–651.
[28]
Jerome Cornfield, William Haenszel, E. Cuyler Hammond, Abraham M. Lilienfeld, Michael B. Shimkin, and Ernst L. Wynder. 1959. Smoking and lung cancer: Recent evidence and a discussion of some questions. JNCI: Journal of the National Cancer Institute 22, 1 (1959), 173–203. DOI:
[29]
Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer, Berlin, Germany, 337–340.
[30]
Jared L. Deutsch and Clayton V. Deutsch. 2012. Latin hypercube sampling with multidimensional uniformity. Journal of Statistical Planning and Inference 142, 3 (2012), 763–772. DOI:
[31]
J. Dick and A. Faivre. 1993. Automating the generation and sequencing of test cases from model-based specifications. In FME’93: Industrial-Strength Formal Methods. Lecture Notes in Computer Science, Vol. 670, 268–284.
[32]
John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C. North, and Gordon Woodhull. 2002. Graphviz—Open source graph drawing tools. In Graph Drawing, Petra Mutzel, Michael Jünger, and Sebastian Leipert (Eds.). Springer, Berlin, Germany, 483–484.
[33]
Ismail Farajpour and Sez Atamturktur. 2013. Error and uncertainty analysis of inexact and imprecise computer models. Journal of Computing in Civil Engineering 27, 4 (2013), 407–418.
[34]
Institute for Disease Modeling. 2022. Covasim: Vaccine Tests. Retrieved July 14, 2023 from https://github.com/InstituteforDiseaseModeling/covasim/blob/master/tests/test_interventions.py
[35]
Institute for Disease Modelling. 2022. Covasim. Retrieved July 14, 2023 from https://github.com/InstituteforDiseaseModeling/covasim
[36]
Marie-Claude Gaudel. 1995. Testing can be formal, too. In TAPSOFT’95: Theory and Practice of Software Development. Lecture Notes in Computer Science, Vol. 915. Springer, 82–96.
[37]
Luca Giamattei, Roberto Pietrantuono, and Stefano Russo. 2023. Reasoning-based software testing. arXiv:2303.01302 (2023). DOI:
[38]
Clark Glymour, Kun Zhang, and Peter Spirtes. 2019. Review of causal discovery methods based on graphical models. Frontiers in Genetics 10 (2019), 524.
[39]
Ross Gore and Paul F. Reynolds. 2012. Reducing confounding bias in predicate-level statistical debugging metrics. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE’12). IEEE, Los Alamitos, CA, 463–473. DOI:
[40]
S. Greenland, J. Pearl, and J. M. Robins. 1999. Causal diagrams for epidemiologic research. Epidemiology 10, 1 (Jan. 1999), 37–48.
[41]
Michael H. Grider, Rishita Jessu, and Rian Kabir. 2019. Physiology, Action Potential. StatPearls [Internet]. Treasure Island, FL.
[42]
Ralph Guderlei and Johannes Mayer. 2007. Statistical metamorphic testing testing programs with random output by means of statistical hypothesis tests and metamorphic testing. In Proceedings of the 7th International Conference on Quality Software (QSIC’07). IEEE, Los Alamitos, CA, 404–409. DOI:
[43]
Joseph Y. Halpern and Judea Pearl. 2005. Causes and explanations: A structural-model approach. Part I: Causes. British Journal for the Philosophy of Science 56, 4 (2005), 843–887.
[44]
Miguel A. Hernán and James M. Robins. 2020. Causal Inference: What if. Chapman & Hall/CRC, Boca Raton, FL.
[45]
R. M. Hierons. 1997. Testing from a Z specification. Journal of Software Testing, Verification and Reliability 7, 1 (1997), 19–33.
[46]
Robert M. Hierons, Kirill Bogdanov, Jonathan P. Bowen, Rance Cleaveland, John Derrick, Jeremy Dick, Marian Gheorghe, et al. 2009. Using formal specifications to support testing. ACM Computing Surveys 41, 2 (2009), Article 9, 76 pages.
[47]
Paul W. Holland. 1986. Statistics and causal inference. Journal of the American Statistical Association 81, 396 (1986), 945–960.
[48]
Andreas Holzinger, Georg Langs, Helmut Denk, Kurt Zatloukal, and Heimo Müller. 2019. Causability and explainability of artificial intelligence in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9, 4 (2019), e1312.
[49]
Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2020. Causal testing: Understanding defects’ root causes. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. IEEE, Los Alamitos, CA, 87–99.
[50]
Upulee Kanewala and James M. Bieman. 2013. Using machine learning techniques to detect metamorphic relations for programs without test oracles. In Proceedings of the 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE’13). IEEE, Los Alamitos, CA, 1–10.
[51]
Upulee Kanewala and James M. Bieman. 2014. Testing scientific software: A systematic literature review. Information and Software Technology 56, 10 (2014), 1219–1232. DOI:
[52]
Luke Keele. 2015. The statistics of causal inference: A view from political methodology. Political Analysis 23, 3 (2015), 313–335. DOI:
[53]
Diane Kelly and Rebecca Sanders. 2008. The challenge of testing scientific software. In Proceedings of the 3rd Annual Conference of the Association for Software Testing (CAST’08). 30–36.
[54]
John Kendall. 2003. Designing a research project: Randomised controlled trials and their principles. Emergency Medicine Journal: EMJ 20, 2 (2003), 164.
[55]
Cliff C. Kerr, Dina Mistry, Robyn M. Stuart, Katherine Rosenfeld, Gregory R. Hart, Rafael C. Núñez, Jamie A. Cohen, et al. 2021. Controlling COVID-19 via test-trace-quarantine. Nature Communications 12, 1 (2021), 1–12.
[56]
Cliff C. Kerr, Robyn M. Stuart, Dina Mistry, Romesh G. Abeysuriya, Katherine Rosenfeld, Gregory R. Hart, Rafael C. Núñez, et al. 2021. Covasim: An agent-based model of COVID-19 dynamics and interventions. PLOS Computational Biology 17, 7 (2021), e1009149.
[57]
Jack P. C. Kleijnen. 1995. Verification and validation of simulation models. European Journal of Operational Research 82, 1 (1995), 145–162.
[58]
Rex B. Kline. 2015. Principles and Practice of Structural Equation Modeling. Guilford Publications.
[59]
Konstantin Kreyman, David Lorge Parnas, and Sanzheng Qiao. 1999. Inspection Procedures for Critical Programs That Model Physical Phenomena. Technical Report. McMaster University, Hamilton, Canada.
[60]
David Lee and Mihalis Yannakakis. 1996. Principles and methods of testing finite-state machines—A survey. Proceedings of the IEEE 84, 8 (1996), 1089–1123.
[61]
Seongmin Lee, Dave Binkley, Robert Feldt, Nicolas Gold, and Shin Yoo. 2021. Causal program dependence analysis. arXiv:2104.09107 (2021).
[62]
Ching-Hsing Luo and Yoram Rudy. 1991. A model of the ventricular cardiac action potential. Depolarization, repolarization, and their interaction. Circulation Research 68, 6 (1991), 1501–1526.
[63]
Daniel Malinsky and David Danks. 2018. Causal discovery algorithms: A practical guide. Philosophy Compass 13, 1 (2018), e12470.
[64]
Lawrence C. Marsh and David R. Cormier. 2001. Spline Regression Models. No. 137. Sage.
[65]
E. F. Moore. 1956. Gedanken-experiments. In Automata Studies, C. Shannon and J. McCarthy (Eds.). Princeton University Press, Princeton, NJ.
[66]
Frédéric Morlot. 2012. A population model based on a Poisson line tessellation. In Proceedings of the 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt’12). IEEE, Los Alamitos, CA, 337–342.
[67]
Sahil Moza. 2020. sahilm89/lhsmdu: Latin Hypercube Sampling with Multi-Dimensional Uniformity (LHSMDU): Speed Boost Minor Compatibility Fixes. Retrieved July 14, 2023 from
[68]
Gail C. Murphy, David Notkin, and Kevin Sullivan. 1995. Software reflexion models: Bridging the gap between source and high-level models. In Proceedings of the 3rd ACM SIGSOFT Symposium on Foundations of Software Engineering. IEEE, Los Alamitos, CA, 18–28.
[69]
Glenford J. Myers, Tom Badgett, Todd M. Thomas, and Corey Sandler. 2004. The Art of Software Testing. Vol. 2. Wiley.
[70]
Josh W. Nevin, F. J. Vaquero-Caballero, David J. Ives, and Seb J. Savory. 2021. Physics-informed Gaussian process regression for optical fiber communication systems. Journal of Lightwave Technology 39, 21 (2021), 6833–6844.
[71]
Srinivas Nidhra and Jagruthi Dondeti. 2012. Black box and white box testing techniques—A literature review. International Journal of Embedded Systems and Applications 2, 2 (2012), 29–50.
[72]
Khenaidoo Nursimulu and Robert L. Probert. 1995. Cause-effect graphing analysis and validation of requirements. In Proceedings of the 1995 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON’95). 46.
[73]
Jeremy E. Oakley and Anthony O’Hagan. 2004. Probabilistic sensitivity analysis of complex models: A Bayesian approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66, 3 (2004), 751–769.
[74]
Sheila F. O’Brien and Qi Long Yi. 2016. How do I interpret a confidence interval? Transfusion 56, 7 (2016), 1680–1683.
[75]
Marie Oldfield and Ella Haig. 2021. Analytical modelling and UK Government policy. AI and Ethics 2, 3 (July 2021), 389–404. DOI:
[76]
Jasmina Panovska-Griffiths, Cliff C. Kerr, Robyn M. Stuart, Dina Mistry, Daniel J. Klein, Russell M. Viner, and Chris Bonell. 2020. Determining the optimal strategy for reopening schools, the impact of test and trace interventions, and the risk of occurrence of a second COVID-19 epidemic wave in the UK: A modelling study. Lancet Child & Adolescent Health 4, 11 (2020), 817–827.
[77]
Jasmina Panovska-Griffiths, Cliff C. Kerr, William Waites, Robyn Margaret Stuart, Dina Mistry, Derek Foster, Daniel J. Klein, Russell M. Viner, and Chris Bonell. 2021. The potential contribution of face coverings to the control of SARS-CoV-2 transmission in schools and broader society in the UK: A modelling study. Scientific Reports 11 (2021), 8747.
[78]
Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669–688. DOI:
[79]
Judea Pearl. 2009. Causal inference in statistics: An overview. Statistics Surveys 3 (2009), 96–146. DOI:
[80]
Judea Pearl. 2009. Causality. Cambridge Universith Press, Cambridge, United Kingdom.
[81]
Judea Pearl. 2018. Does obesity shorten life? Or is it the soda? On non-manipulable causes. Journal of Causal Inference 6, 2 (2018), 20182001. DOI:
[82]
Judea Pearl and Dana Mackenzie. 2018. The Book of Why. Allen Lane.
[83]
Judea Pearl and Thomas S. Verma. 1995. A theory of inferred causation. In Studies in Logic and the Foundations of Mathematics. Vol. 134. Elsevier, 789–811.
[84]
Andy Podgurski and Yiğit Küçük. 2020. CounterFault: Value-based fault localization by modeling and predicting counterfactual outcomes. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME’20). IEEE, Los Alamitos, CA, 382–393.
[85]
Karishma Rahman and Upulee Kanewala. 2018. Predicting metamorphic relations for matrix calculation programs. In Proceedings of the 2018 IEEE/ACM 3rd International Workshop on Metamorphic Testing (MET’18). IEEE, Los Alamitos, CA, 10–13.
[86]
Paul Ralph. 2021. ACM SIGSOFT empirical standards released. ACM SIGSOFT Software Engineering Notes 46, 1 (Feb. 2021), 19. DOI:
[87]
Carl Edward Rasmussen. 2004. Gaussian processes in machine learning. In Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2–14, 2003, Tübingen, Germany, August 4–16, 2003, Revised Lectures, Olivier Bousquet, Ulrike von Luxburg, and Gunnar Rätsch (Eds.). Springer, Berlin, Germany, 63–71. DOI:
[88]
Carl Edward Rasmussen and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Vol. 1. Springer.
[89]
Kenneth J. Rothman and Sander Greenland. 2005. Causation and causal inference in epidemiology. American Journal of Public Health 95, S1 (2005), 144–150.
[90]
Donald B. Rubin. 2005. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association 100, 469 (2005), 322–331.
[91]
Fanny Sarrazin, Francesca Pianosi, and Thorsten Wagener. 2016. Global sensitivity analysis of environmental models: Convergence and validation. Environmental Modelling & Software 79 (2016), 135–152.
[92]
Nick Scott, Anna Palmer, Dominic Delport, Romesh Abeysuriya, Robyn Stuart, Cliff C. Kerr, Dina Mistry, et al. 2020. Modelling the impact of reducing control measures on the COVID-19 pandemic in a low transmission setting. Medical Journal of Australia 214, 2 (2020), 79–83.
[93]
Sergio Segura, Gordon Fraser, Ana B. Sanchez, and Antonio Ruiz-Cortés. 2016. A survey on metamorphic testing. IEEE Transactions on Software Engineering 42, 9 (2016), 805–824.
[94]
Dongeek Shin, Ahmed Kirmani, Andrea Colaço, and Vivek K. Goyal. 2013. Parametric Poisson process imaging. In Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing. IEEE, Los Alamitos, CA, 1053–1056.
[95]
Gang Shu, Boya Sun, Andy Podgurski, and Feng Cao. 2013. MFL: Method-level fault localization with causal inference. In Proceedings of the 2013 IEEE 6th International Conference on Software Testing, Verification, and Validation. IEEE, Los Alamitos, CA, 124–133.
[96]
J. M. Spivey. 1992. The Z Notation: A Reference Manual (2nd ed.). Prentice Hall.
[97]
Matt Staats, Michael W. Whalen, and Mats P. E. Heimdahl. 2011. Programs, tests, and oracles: The foundations of testing revisited. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE’11). IEEE, Los Alamitos, CA, 391–400.
[98]
Michael Stein. 1987. Large sample properties of simulations using Latin hypercube sampling. Technometrics 29, 2 (1987), 143–151.
[99]
James H. Stock and Mark W. Watson. 2003. Introduction to Econometrics. Vol. 104. Addison Wesley, Boston, MA.
[100]
Robyn M. Stuart, Romesh G. Abeysuriya, Cliff C. Kerr, Dina Mistry, Daniel J. Klein, Richard Gray, Margaret Hellard, and Nick Scott. 2020. The Role of Masks in Reducing the Risk of New Waves of COVID-19 in Low Transmission Settings: A Modeling Study. Retrieved July 14, 2023 from
[101]
Peter W. G. Tennant, Eleanor J. Murray, Kellyn F. Arnold, Laurie Berrie, Matthew P. Fox, Sarah C. Gadd, Wendy J. Harrison, et al. 2021. Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: Review and recommendations. International Journal of Epidemiology 50, 2 (2021), 620–632.
[102]
Robin N. Thompson. 2020. Epidemiological models are important tools for guiding COVID-19 interventions. BMC Medicine 18, 1 (2020), 1–4.
[103]
Jan Tretmans. 2008. Model based testing with labelled transition systems. In Formal Methods and Testing. Lecture Notes in Computer Science, Vol. 4949. Springer, 1–38.
[104]
Mark Utting and Bruno Legeard. 2010. Practical Model-Based Testing: A Tools Approach. Elsevier.
[105]
Mark Utting, Alexander Pretschner, and Bruno Legeard. 2012. A taxonomy of model-based testing approaches. Software Testing, Verification and Reliability 22, 5 (2012), 297–312.
[106]
Tyler J. VanderWeele and Peng Ding. 2017. Sensitivity analysis in observational research: Introducing the E-value. Annals of Internal Medicine 167, 4 (2017), 268–274.
[107]
Ian Vernon, Michael Goldstein, and Richard Bower. 2014. Galaxy formation: Bayesian history matching for the observable universe. Statistical Science 29, 1 (2014), 81–90. DOI:
[108]
Stefan Wager and Susan Athey. 2018. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association 113, 523 (2018), 1228–1242. DOI:
[109]
Neil Walkinshaw, Ramsay Taylor, and John Derrick. 2016. Inferring extended finite state machine models from software executions. Empirical Software Engineering 21 (2016), 811–853.
[110]
Elaine Weyuker. 1982. On testing non-testable programs. Computer Journal 25 (1982), 465–470. DOI:
[111]
Elaine J. Weyuker. 1983. Assessing test data adequacy through program inference. ACM Transactions on Programming Languages and Systems 5, 4 (1983), 641–655.
[112]
Christof Wolf and Henning Best. 2013. The SAGE Handbook of Regression Analysis and Causal Inference. SAGE.
[113]
W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.

Cited By

View all
  • (2025)Causal reasoning in Software Quality Assurance: A systematic reviewInformation and Software Technology10.1016/j.infsof.2024.107599178(107599)Online publication date: Feb-2025
  • (2024)Can causality accelerate experimentation in software systems?Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644985(280-281)Online publication date: 14-Apr-2024
  • (2024)Causal Test Adequacy2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00023(161-172)Online publication date: 27-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 1
January 2024
933 pages
EISSN:1557-7392
DOI:10.1145/3613536
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2023
Online AM: 12 July 2023
Accepted: 20 June 2023
Revised: 14 June 2023
Received: 01 September 2022
Published in TOSEM Volume 33, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Software testing
  2. causal inference
  3. causal testing

Qualifiers

  • Research-article

Funding Sources

  • EPSRC CITCoM

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)359
  • Downloads (Last 6 weeks)52
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Causal reasoning in Software Quality Assurance: A systematic reviewInformation and Software Technology10.1016/j.infsof.2024.107599178(107599)Online publication date: Feb-2025
  • (2024)Can causality accelerate experimentation in software systems?Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644985(280-281)Online publication date: 14-Apr-2024
  • (2024)Causal Test Adequacy2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00023(161-172)Online publication date: 27-May-2024
  • (2024)CausalOps — Towards an industrial lifecycle for causal probabilistic graphical modelsInformation and Software Technology10.1016/j.infsof.2024.107520174:COnline publication date: 1-Oct-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media