Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Reproducibility in Scientific Computing

Published: 16 July 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Reproducibility is widely considered to be an essential requirement of the scientific process. However, a number of serious concerns have been raised recently, questioning whether today’s computational work is adequately reproducible. In principle, it should be possible to specify a computation to sufficient detail that anyone should be able to reproduce it exactly. But in practice, there are fundamental, technical, and social barriers to doing so. The many objectives and meanings of reproducibility are discussed within the context of scientific computing. Technical barriers to reproducibility are described, extant approaches surveyed, and open areas of research are identified.

    References

    [1]
    Ali Abedi, Andrew Heard, and Tim Brecht. 2015. Conducting repeatable experiments and fair comparisons using 802.11 n MIMO networks. ACM SIGOPS Oper. Syst. Rev. 49, 1 (2015), 41--50.
    [2]
    Erika Abraham, Hadas Kress-Gazit, Lorenzo Natale, and Armando Tacchella. 2017. Computer-assisted engineering for robotics and autonomous systems (dagstuhl seminar 17071). In Dagstuhl Reports, Vol. 7. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
    [3]
    Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain. 2012. Makeflow: A portable abstraction for data intensive computing on clusters, clouds, and grids. In Proceedings of the Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET’12) at ACM SIGMOD.
    [4]
    Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludascher, and Steve Mock. 2004. Kepler: An extensible system for design and execution of scientific workflows. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management. IEEE, 423--424.
    [5]
    Kaizar Amin, Gregor Von Laszewski, Mihael Hategan, Nestor J. Zaluzec, Shawn Hampton, and Albert Rossi. 2004. Gridant: A client-controllable grid workflow system. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences, 2004. IEEE, 1--10.
    [6]
    Peter Amstutz, Michael R. Crusoe, Nebojša Tijanić, Brad Chapman, John Chilton, Michael Heuer, Andrey Kartashov, Dan Leehr, Hervé Ménager, Maya Nedeljkovich, Matt Scales, Stian Soiland-Reyes, and Luka Stojanovic. 2016. Common Workflow Language (version 1.0). (July 2016).
    [7]
    Paul Anderson and Edmund Smith. 2005. Configuration tools: Working together. In LISA. 31--37.
    [8]
    Matjaz B. Juric, Benny Mathew, and Poornachandra G. Sarang. 2006. Business process execution language for web services: an architect and developer's guide to orchestrating web services using BPEL4WS. Packt Publishing Ltd.
    [9]
    Lerina Aversano, Aniello Cimitile, Pierpaolo Gallucci, and Maria Luisa Villani. 2002. FlowManager: A workflow management system based on petri nets. In Proceedings of the 26th Annual International Computer Software and Applications Conference (COMPSAC’02). IEEE, 1054--1059.
    [10]
    Lorena A. Barba. 2016. The hard road to reproducibility. Science 354, 6308 (2016), 142--142.
    [11]
    Ricardo Melo Bastos and Duncan Dubugras A. Ruiz. 2002. Extending UML activity diagram for workflow modeling in production systems. In Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS’02). IEEE, 3786--3795.
    [12]
    Louis Bavoil, Steven P. Callahan, Patricia J. Crossno, Juliana Freire, Carlos E. Scheidegger, Cláudio T. Silva, and Huy T. Vo. 2005. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Vis. 2005. IEEE, 135--142.
    [13]
    Olivier Beaumont, Jocelyne Erhel, and Bernard Philippe. 2000. Aquarels: A problem-solving environment for validating scientific software. In Enabling Technologies for Computational Science. Springer, 351--362.
    [14]
    C. Glenn Begley and Lee M. Ellis. 2012. Drug development: Raise standards for preclinical cancer research. Nature 483, 7391 (2012), 531--533.
    [15]
    Robert Bell, Jim Bennett, Yehuda Koren, and Chris Volinsky. 2009. The million dollar programming prize. IEEE Spectrum 46, 5 (2009), 28--33.
    [16]
    Jakob Blomer, Predrag Buncic, and Thomas Fuhrmann. 2011. CernVM-FS: Delivering scientific software to globally distributed computing resources. In Proceedings of the 1st International Workshop on Network-Aware Data Management. ACM, 49--56.
    [17]
    Barry Boehm. 1989. Software risk management. In Proceedings of the European Software Engineering Conference. Springer, 1--19.
    [18]
    Choompol BOONMEE and Shigeo KAWATA. 1998. Computer-assisted simulation environment for partial-differential-equation problem. Trans. Japan Soc. Comput. Eng. Sci. (1998), 19980002--19980002.
    [19]
    Randall Bramley, Bruce Char, Dennis Gannon, Thomas T. Hewett, Chris Johnson, and John R. Rice. 2000. Workshop on scientific knowledge, information and computing (SIDEKI’98). Enabling Technol. Comput. Sci.: Framew. Middlew. Environ. 548 (2000), 19.
    [20]
    Grant R. Brammer, Ralph W. Crosby, Suzanne J. Matthews, and Tiffani L. Williams. 2011. Paper Mâché: Creating dynamic reproducible science. Proced. Comput. Sci. 4 (2011), 658--667.
    [21]
    Tim Bray, Jean Paoli, C. Michael Sperberg-McQueen, Eve Maler, and Franois Yergeau. 1997. Extensible Markup Language (XML). World Wide Web Journal 2, 4 (1997), 27--66.
    [22]
    John Bresnahan, Tim Freeman, David LaBissoniere, and Kate Keahey. 2011. Managing appliance launches in infrastructure clouds. In Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery. ACM, 12.
    [23]
    Eric A. Brewer. 2015. Kubernetes and the path to cloud native. In Proceedings of the 6th ACM Symposium on Cloud Computing. ACM, 167--167.
    [24]
    Tomasz Buchert, Cristian Ruiz, Lucas Nussbaum, and Olivier Richard. 2015. A survey of general-purpose experiment management tools for distributed systems. Future Gener. Comput. Syst. 45 (2015), 1--12.
    [25]
    Jonathan B. Buckheit and David L. Donoho. 1995. Wavelab and Reproducible Research. Springer.
    [26]
    Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and where: A characterization of data provenance. In Database Theory—ICDT 2001. Springer, 316--330.
    [27]
    Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger, Cláudio T. Silva, and Huy T. Vo. 2006. VisTrails: Visualization meets data management. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, 745--747.
    [28]
    Franck Cappello, Eddy Caron, Michel Dayde, Frédéric Desprez, Yvon Jégou, Pascale Primet, Emmanuel Jeannot, Stéphane Lanteri, Julien Leduc, Nouredine Melab, et al. 2005. Grid’5000: A large scale and highly reconfigurable grid experimental testbed. In Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing. IEEE Computer Society, 99--106.
    [29]
    Dylan Chapp, Travis Johnston, and Michela Taufer. 2015. On the need for reproducible numerical accuracy through intelligent runtime selection of reduction algorithms at the extreme scale. In Proceedings of the 2015 IEEE International Conference on Cluster Computing (CLUSTER’15), IEEE, 166--175.
    [30]
    Fernando Chirigati, Dennis Shasha, and Juliana Freire. 2013. Reprozip: Using provenance to support computational reproducibility. In Presented as Part of the 5th USENIX Workshop on the Theory and Practice of Provenance.
    [31]
    CircleCI. 2017. Continuous Integration and Delivery—CircleCI. Retrieved August 2, 2017, https://circleci.com/.
    [32]
    Jon Claerbout. 2011. Making Scientific Contributions Reproducible. Retrieved July 11, 2006, http://sepwww.stanford.edu/oldsep/matt/join/redoc/web/iris.html.
    [33]
    Jon Claerbout and Martin Karrenbach. 1992. Electronic documents give reproducible research a new meaning. In Proceedings of the 62nd Annual International Meeting of the Society of Exploration Geophysics. 601--604.
    [34]
    J. F. Claerbout. 1991. Electronic Document Preface. Technical Report SEP-72. Stanford Exploration Project. 18 pages. http://sepwww.stanford.edu/public/docs/sep72/jon3/paper_html/node4.html.
    [35]
    National Research Council et al. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. National Academies Press.
    [36]
    Ludovic Courtès and Ricardo Wurmus. 2015. Reproducible and user-controlled software environments in HPC with guix. In European Conference on Parallel Processing. Springer, 579--591.
    [37]
    Jennifer Crocker and M. Lynne Cooper. 2011. Addressing scientific fraud. Science 334, 6060 (2011), 1182--1182.
    [38]
    Donald Dabdub, K. Mani Chandy, and Thomas T. Hewett. 2000. Managing specificity and generality: Tailoring general archetypal PSEs to specific users. In Enabling Technologies for Computational Science. Springer, 65--77.
    [39]
    Andrew Davison. 2012. Automated capture of experiment context for easier reproducibility in computational research. Comput. Sci. Eng. 14, 4 (2012), 48--56.
    [40]
    Ewa Deelman, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Sonal Patil, Mei-Hui Su, Karan Vahi, and Miron Livny. 2004. Pegasus: Mapping scientific workflows onto the grid. In Grid Computing. Springer, 11--20.
    [41]
    Ewa Deelman, Dennis Gannon, Matthew Shields, and Ian Taylor. 2009. Workflows and e-Science: An overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25, 5 (2009), 528--540.
    [42]
    David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen. 2014. Eidetic systems. In Proceedings of the 11th USENIX Symposium on Oper. Systems Design and Implementation (OSDI’14), Vol. 14. 525--540.
    [43]
    Paolo Di Tommaso, Evan Floden, Maria Chatzou, and Cedric Notredame. 2017. Using the NextFlow framework for reproducible in-silico omics analyses across clusters and clouds. PeerJ Preprints 5 (2017), e2796v1.
    [44]
    Andrew Dienstfrey and Ronald Boisvert. 2012. Uncertainty Quantification in Scientific Computing: 10th IFIP WG 2.5 Working Conference (WoCoUQ’11), Vol. 377. Springer.
    [45]
    Christian Dietrich and Daniel Lohmann. 2015. The dataref versuchung: Saving time through better internal repeatability. ACM SIGOPS Oper. Systems Rev. 49, 1 (2015), 51--60.
    [46]
    Eelco Dolstra and Andres Löh. 2008. NixOS: A purely functional Linux distribution. In ACM Sigplan Not., Vol. 43. ACM, 367--378.
    [47]
    Carsten Dominik. 2010. The Org Mode 7 Reference Manual-Organize your life with GNU Emacs. Network Theory Ltd.
    [48]
    Chris Drummond. 2009. Replicability is not reproducibility: Nor is it good science. Cogprints Technical Report #7691. http://cogprints.org/7691/7/ICMLws09.pdf.
    [49]
    Marlon Dumas and Arthur H. M. Ter Hofstede. 2001. UML activity diagrams as a workflow specification language. In Proceedings of the International Conference on the Unified Modeling Language. Springer, 76--90.
    [50]
    Paul M. Duvall. 2007. Continuous Integration. Pearson Education India.
    [51]
    Sarah Edwards, Xuan Liu, and Niky Riga. 2015. Creating repeatable computer science and networking experiments on shared, public testbeds. ACM SIGOPS Oper. Systems Rev. 49, 1 (2015), 90--99.
    [52]
    Bo Einarsson. 2005. Accuracy and Reliability in Scientific Computing. SIAM.
    [53]
    Joseph Emeras, Bruno Bzeznik, Olivier Richard, Yiannis Georgiou, and Cristian Ruiz. 2012. Reconstructing the software environment of an experiment with Kameleon. In Proceedings of the 5th ACM COMPUTE Conference: Intelligent 8 Scalable System Technologies. ACM, 16.
    [54]
    Hakan Erdogmus, Maurizio Morisio, and Marco Torchiano. 2005. On the effectiveness of test-first approach to programming. In IEEE Transactions on Software Engineering 31, 3 (2005), 226--237.
    [55]
    Dror G. Feitelson. 2015. From repeatability to reproducibility and corroboration. ACM SIGOPS Oper. Syst. Rev. 49, 1 (2015), 3--11.
    [56]
    Karl Fogel. 2005. Producing Open Source Software: How to Run a Successful Free Software Project. O’Reilly Media, Inc.
    [57]
    Sergey Fomel. 2015. Reproducible research as a community effort: Lessons from the madagascar project. Comput. Sci. Eng. 17, 1 (2015), 20--26.
    [58]
    Sergey Fomel and Gilles Hennenfent. 2007. Reproducible computational experiments using scons. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'07), Vol. 4. IEEE, IV--1257.
    [59]
    Sergey Fomel, Paul Sava, Ioan Vlad, Yang Liu, and Vladimir Bashkardin. 2013. Madagascar: Open-source software project for multidimensional data analysis and reproducible computational experiments. J. Open Res. Softw. 1, 1 (2013).
    [60]
    Sergey Fomel, Matthias Schwab, and Joel Schroeder. 1997. Empowering SEP’s documents. SEP-94: Stanford Exploration Project (1997), 339--361.
    [61]
    Ian Foster, Jens Vockler, Michael Wilde, and Yong Zhao. 2002. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on the Scientific and Statistical Database Management, 2002. IEEE, 37--46.
    [62]
    Martin Fowler and Matthew Foemmel. 2006. Continuous integration. Thought-Works, Retrieved from http://www.thoughtworks.com/ContinuousIntegration.pdf, 122.
    [63]
    Juliana Freire, David Koop, Emanuele Santos, and Cláudio T. Silva. 2008. Provenance for computational tasks: A survey. Comput. Sci. Eng. 10, 3 (2008).
    [64]
    James Frey. 2002. Condor DAGMan: Handling inter-job dependencies. Technical report, University of Wisconsin, Dept. of Computer Science).
    [65]
    Hideaki Fuju, Shigeo Kawata, Hideaki Sugiura, Yuichi Saitoh, Yoshikazu Hayase, Hitohide Usami, Motohiro Yamada, Yutaka Miyahara, Hiroyuki Kanazawa, and Takashi Kikuchi. 2006. Scientific simulation execution support on a closed distributed computer environment. In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing (e-Science’06). IEEE, 109--109.
    [66]
    Efstratios Gallopoulos, Elias Houstis, and John R. Rice. 1994. Computer as thinker/doer: Problem-solving environments for computational science. IEEE Comput. Sci. Eng. 1, 2 (1994), 11--23.
    [67]
    John R. Rice. 1991. Future research directions in problem solving environments for computational science. In Proceedings of the IFIP TC2/WG 2.5 Working Conference on Programming Environments for High-Level Scientific Problem Solving. North-Holland Publishing Co., 363--369.
    [68]
    Rogel Garcia and Marco Tulio Valente. NextFlow: Business process meets mapping frameworks. Retrieved March 9, 2017, http://www.nextflow.org/downloads/Nextflow_tech_report.pdf.
    [69]
    Daniel Garijo, Oscar Corcho, and Yolanda Gil. 2013. Detecting common scientific workflow fragments using templates and execution provenance. In Proceedings of the 7th International Conference on Knowledge Capture. ACM, 33--40.
    [70]
    Matan Gavish and David Donoho. 2011. A universal identifier for computational results. Proced. Comput. Sc. 4 (2011), 637--647.
    [71]
    Belinda Giardine, Cathy Riemer, Ross C. Hardison, Richard Burhans, Laura Elnitski, Prachi Shah, Yi Zhang, Daniel Blankenberg, Istvan Albert, James Taylor, et al. 2005. Galaxy: A platform for interactive large-scale genome analysis. Genome Res. 15, 10 (2005), 1451--1455.
    [72]
    Carole Goble. 2013. Results may vary. Reproducibility, open science, and all that jazz (July 2013). Keynote given by Carole Goble on July 23, 2013 at ISMB/ECCB 2013. Retrieved November 9, 2016, http://www.slideshare.net/carolegoble/ismb2013-keynotecleangoble/17.
    [73]
    O. S. Gómez, N. Juristo, and S. Vegas. 2010. Replication, reproduction and re-analysis: Three ways for verifying experimental findings. In Proceedings of the 1st International Workshop on Replication in Empirical Software Engineering Research (RESER’10), Cape Town, South Africa.
    [74]
    Alyssa Goodman, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth, Margaret Hedstrom, et al. 2014. Ten simple rules for the care and feeding of scientific data. PLoS Comput. Biol. 10, 4 (2014), e1003542.
    [75]
    Eelco Dolstra and Eelco Visser. 2007. Automated software testing and release with nix build farms. In Proceedings of the 3rd European Symposium on Verification and Validation of Software Systems (VVSS'07). Eindhoven University of Technology, 65--77.
    [76]
    Zhijie Guan, Francisco Hernandez, Purushotham Bangalore, Jeff Gray, Anthony Skjellum, Vijay Velusamy, and Yin Liu. 2006. Grid-flow: A grid-enabled scientific workflow system with a petri-net-based interface. Concurr. Comput.: Pract. Exp. 18, 10 (2006), 1115--1140.
    [77]
    Pradeep Kumar Gunda, Lenin Ravindranath, Chandramohan A. Thekkath, Yuan Yu, and Li Zhuang. 2010. Nectar: Automatic management of data and computation in datacenters. In OSDI, Vol. 10. 1--8.
    [78]
    Mitchell Hashimoto. 2013. Vagrant: Up and Running. O’Reilly Media, Inc.
    [79]
    Les Hatton and Gregory Warr. 2016. Full computational reproducibility in biological science: Methods, software and a case study in protein biology. arXiv:1608.06897 (2016).
    [80]
    Francisco Hernández, Purushotham Bangalore, Jeff Gray, and Kevin Reilly. 2005. A graphical modeling environment for the generation of workflows for the globus toolkit. In Component Models and Systems for Grid Applications. Springer, 79--96.
    [81]
    Thomas T. Hewett and Jennifer L. DePaul. 2000. Toward a human centered scientific problem solving environment. In Kluwer International Series in Engineering and Computer Science. 79--90.
    [82]
    Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy H. Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI, Vol. 11, 22--22.
    [83]
    Andreas Hoheisel. 2006. User tools and languages for graph-based Grid workflows. Concurr. Comput.: Pract. Exp. 18, 10 (2006), 1101--1113.
    [84]
    David Hollingsworth and U. K. Hampshire. 1995. Workflow management coalition: The workflow reference model. Document Number TC00-1003 19 (1995). http://www.pa.icar.cnr.it/cossentino/ICT/doc/D12.1%20-%20Workflow%20Management%20Coalition%20-%20The%20Workflow%20Reference%20Model.pdf.
    [85]
    Elias N. Houstis, John R. Rice, Efstratios Gallopoulos, and Randall Bramley. 2012. Enabling Technologies for Computational Science: Frameworks, Middleware and Environments, Vol. 548. Springer Science 8 Business Media.
    [86]
    Bill Howe. 2012. CDE: A tool for creating portable experimental software packages. Comput. Sci. Eng. 14, 4 (2012), 32--35.
    [87]
    Bill Howe. 2012. Virtual appliances, cloud computing, and reproducible research. Comput. Sci. Eng. 14, 4 (2012), 36--41.
    [88]
    Duncan Hull, Katy Wolstencroft, Robert Stevens, Carole Goble, Mathew R. Pocock, Peter Li, and Tom Oinn. 2006. Taverna: A tool for building and running workflows of services. Nucleic Acids Res. 34, Suppl. 2 (2006), W729--W732.
    [89]
    John P. A. Ioannidis. 2005. Why most published research findings are false. PLoS Med 2, 8 (2005), e124.
    [90]
    Peter Ivie and Douglas Thain. 2016. PRUNE: A preserving run environment for reproducible scientific computing. In Proceedings of the IEEE Conference on e-Science.
    [91]
    P. Ivie, C. Zheng, and D. Thain. 2016. An analysis of reproducibility and non-determinism in HEP software and ROOT data. In J. Phys.: Conf. Ser. IOP Publishing.
    [92]
    Barbara R. Jasny, Gilbert Chin, Lisa Chong, and Sacha Vignieri. 2011. Again, and again, and again? Science 334, 6060 (2011), 1225--1225.
    [93]
    Emmanuel Jeanvoine, Luc Sarzyniec, and Lucas Nussbaum. 2013. Kadeploy3: Efficient and scalable operating system provisioning for clusters. USENIX; Login: 38, 1 (2013), 38--44.
    [94]
    Jenkins. 2017. Jenkins. Retrieved August 2, 2017, https://jenkins.io/.
    [95]
    Ivo Jimenez, Michael Sevilla, Noah Watkins, Carlos Maltzahn, Jay Lofstead, Kathryn Mohror, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2017. The popper convention: Making reproducible systems evaluation practical. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’17). IEEE, 1561--1570.
    [96]
    Chris Johnson. 2004. Top scientific visualization research problems. IEEE Comput. Graph. Appl. 24, 4 (2004), 13--17.
    [97]
    Shigeo Kawata. 2015. Computer assisted problem solving environment (PSE). In Encyclopedia of Information Science and Technology (3rd ed.). IGI Global, 1251--1260.
    [98]
    Jihie Kim, Ewa Deelman, Yolanda Gil, Gaurang Mehta, and Varun Ratnakar. 2008. Provenance trails in the wings/pegasus system. Concurr. Computat.: Pract. Exp. 20, 5 (2008), 587--597.
    [99]
    Jonathan Klinginsmith, Malika Mahoui, and Yuqing Melanie Wu. 2011. Towards reproducible escience in the cloud. In Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom’11). IEEE, 582--586.
    [100]
    Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, et al. 2016. Jupyter notebooks? A publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas (2016), 87.
    [101]
    Steven Knight. 2005. Building software with SCons. Comput. Sci. Eng. 7, 1 (2005), 79--88.
    [102]
    Ivan Krsul, Arijit Ganguly, Jian Zhang, Jose A. B. Fortes, and Renato J. Figueiredo. 2004. Vmplants: Providing and managing virtual machine execution environments for grid computing. In Proceedings of the ACM/IEEE SC2004 Conference on Supercomputing, 2004. IEEE, 7--7.
    [103]
    Christine Laine, Steven N. Goodman, Michael E. Griswold, and Harold C. Sox. 2007. Reproducible research: Moving toward research the public can really trust. Ann. Intern. Med. 146, 6 (2007), 450--453.
    [104]
    Dag Toppe Larsen, Jakob Blomer, Predrag Buncic, Ioannis Charalampidis, and Artem Haratyunyan. 2012. Long-term preservation of analysis software environment. In J. Phys.: Conf. Ser., Vol. 396. IOP Publishing, 032064.
    [105]
    Yung-Li Lee, Mark E. Barkey, and Hong-Tae Kang. 2011. Metal Fatigue Analysis Handbook: Practical Problem-Solving Techniques for Computer-Aided Engineering. Elsevier.
    [106]
    Jeffrey T. Leek and Roger D. Peng. 2015. Opinion: Reproducible research can still be wrong: Adopting a prevention approach. Proceedings of the National Academy of Sciences 112, 6 (2015), 1645--1646.
    [107]
    Randall J. LeVeque, Ian M. Mitchell, and Victoria Stodden. 2012. Reproducible research for scientific computing: Tools and strategies for changing the culture. Comput. Sci. Eng. 14, 4 (2012), 13.
    [108]
    Frank Leymann et al. 2001. Web Services FlowLanguage (WSFL 1.0). (2001).
    [109]
    Ji Liu, Esther Pacitti, Patrick Valduriez, and Marta Mattoso. 2015. A survey of data-intensive scientific workflow management. J. Grid Comput. 13, 4 (2015), 457--493.
    [110]
    Jon Loeliger. 2006. Collaborating with GIT. Linux Mag. June (2006).
    [111]
    Dionysios Logothetis, Christopher Olston, Benjamin Reed, Kevin C. Webb, and Ken Yocum. 2010. Stateful bulk processing for incremental analytics. In Proceedings of the 1st ACM symposium on Cloud computing. ACM, 51--62.
    [112]
    James Loope. 2011. Managing Infrastructure with Puppet. O’Reilly Media, Inc.
    [113]
    Bertram Ludäscher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao. 2006. Scientific workflow management and the Kepler system. Concurr. Comput.: Pract. Exp. 18, 10 (2006), 1039--1065.
    [114]
    Bertram Ludascher, Ilkay Altintas, and Amarnath Gupta. 2003. Compiling abstract scientific workflows into web service workflows. In Proceedings of the 15th International Conference on Scientific and Statistical Database Management, 2003. IEEE, 251--254.
    [115]
    Cory Lueninghoener. 2011. Getting started with configuration management. (2011).
    [116]
    Ben Marwick. 2016. Computational reproducibility in archaeological research: Basic principles and a case study of their implementation. J. Archaeol. Meth. Theor. (2016), 1--27.
    [117]
    Anthony Mayer, Steve McGough, Nathalie Furmento, William Lee, Steven Newhouse, and John Darlington. 2003. ICENI dataflow and workflow: Composition and scheduling in space and time. In UK e-Science All Hands Meeting, Vol. 634. 627.
    [118]
    Robert Mecklenburg. 2004. Managing Projects with GNU Make. O’Reilly Media, Inc.
    [119]
    Haiyan Meng and Douglas Thain. 2015. Umbrella: A portable environment creator for reproducible computing on clusters, clouds, and grids. In Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC’15). ACM, New York, NY.
    [120]
    Dirk Merkel. 2014. Docker: Lightweight linux containers for consistent development and deployment. Linux J. 2014, 239 (2014), 2.
    [121]
    Ralph C. Merkle. 1982. Method of providing digital signatures. (Jan. 5 1982). US Patent 4,309,569. File date: Sep. 5, 1979.
    [122]
    Jill P. Mesirov. 2010. Accessible reproducible research. Science 327, 5964 (2010), 415--416.
    [123]
    Steffen Meyer, Patrick Healy, Theo Lynn, and Jim Morrison. 2013. Quality assurance for open source software configuration management. In Proceedings of the 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC’13), IEEE, 454--461.
    [124]
    Roger E. Millsap and Howard T. Everson. 1993. Methodology review: Statistical approaches for assessing measurement bias. Appl. Psychol. Meas. 17, 4 (1993), 297--334.
    [125]
    Gyöngyvér Molnár and Benő Csapó. 2017. Exploration and learning strategies in an interactive problem-solving environment at the beginning of higher education studies. (2017).
    [126]
    Kevin Murrell. 2013. The Harwell Dekatron computer. In Making the History of Computing Relevant. Springer, 309--313.
    [127]
    James Myers, Margaret Hedstrom, Dharma Akmon, Sandy Payette, Beth A. Plale, Inna Kouper, Scott McCaulay, Robert McDonald, Isuru Suriarachchi, Aravindh Varadharaju, et al. 2015. Towards sustainable curation and preservation: The SEAD project’s data services approach. In Proceedings of the 2015 IEEE 11th International Conference on e-Science (e-Science’15), IEEE, 485--494.
    [128]
    Chris J. Oates, Jim Q. Smith, and Sach Mukherjee. 2016. Estimating causal structure using conditional DAG models. J. Mach. Learn. Res. 17, 54 (2016), 1--23.
    [129]
    William L. Oberkampf and Christopher J. Roy. 2010. Verification and Validation in Scientific Computing. Cambridge University Press.
    [130]
    Tom Oinn, Matthew Addis, Justin Ferris, Darren Marvin, Martin Senger, Mark Greenwood, Tim Carver, Kevin Glover, Matthew R. Pocock, Anil Wipat, et al. 2004. Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 17 (2004), 3045--3054.
    [131]
    Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Antoon Goderis, Duncan Hull, Darren Marvin, et al. 2006. Taverna: Lessons in creating a workflow environment for the life sciences. Concurr. Comput.: Pract. Exp. 18, 10 (2006), 1067--1100.
    [132]
    Sudhir Pandey. 2012. Investigating community, reliability and usability of CFEngine, Chef and Puppet. Master thesis. University of Oslo's Department of Informatics.
    [133]
    Roger Peng. 2015. The reproducibility crisis in science: A statistical counterattack. Significance 12, 3 (2015), 30--32.
    [134]
    Roger D. Peng. 2011. Reproducible research in computational science. Science 334, 6060 (2011), 1226--1227.
    [135]
    Quan Pham, Tanu Malik, and Ian Foster. 2013. Using provenance for repeatability. In Presented as Part of the 5th USENIX Workshop on the Theory and Practice of Provenance. 5--8.
    [136]
    Karl Popper. 2005. The Logic of Scientific Discovery. Routledge.
    [137]
    Florian Prinz, Thomas Schlange, and Khusru Asadullah. 2011. Believe it or not: How much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 9 (2011), 712--712.
    [138]
    Todd Proebsting, Alex M. Warren, and Christian Collberg. 2015. Repeatability and benefaction in computer systems research. University of Arizona TR 14. Vol. 4. 1--68.
    [139]
    Min Ragan-Kelley, F. Perez, B. Granger, T. Kluyver, P. Ivanov, J. Frederic, and M. Bussonier. 2014. The jupyter/IPython architecture: A unified view of computational research, from interactive exploration to communication and publication. In AGU Fall Meeting Abstracts, Vol. 1, 07.
    [140]
    Arcot Rajasekar, Reagan Moore, Chien-Yi Hou, Christopher A. Lee, Richard Marciano, Antoine de Torcy, Michael Wan, Wayne Schroeder, Sheau-Yen Chen, Lucas Gilbert, et al. 2010. iRODS primer: Integrated rule-oriented data system. Synth. Lect. Inform. Concepts, Retr. Serv. 2, 1 (2010), 1--143.
    [141]
    Joyce M. Ray. 2014. Research Data Management: Practical Strategies for Information Professionals. Purdue University Press.
    [142]
    John R. Rice. 2000. Future challenges for scientific simulation. In Enabling Technologies for Computational Science. Springer, 7--17.
    [143]
    Cristian Ruiz, Olivier Richard, and Joseph Emeras. 2014. Reproducible software appliances for experimentation. In Testbeds and Research Infrastructure: Development of Networks and Communities. Springer, 33--42.
    [144]
    James Rumbaugh, Ivar Jacobson, and Grady Booch. 2004. The Unified Modeling Language Reference Manual. Pearson Higher Education.
    [145]
    Douglas S. Santry, Michael J. Feeley, Norman C. Hutchinson, Alistair C. Veitch, Ross W. Carton, and Jacob Ofir. 1999. Deciding when to forget in the elephant file system. In ACM SIGOPS Oper. Syst. Rev. 33. ACM, 110--123.
    [146]
    Matthias Schwab, Martin Karrenbach, and Jon Claerbout. 2000. Making scientific computations reproducible. Comput. Sci. Eng. 2, 6 (2000), 61--67.
    [147]
    Barbara Sierman. 2014. The SCAPE policy framework, maturity levels and the need for realistic preservation policies. IPRES 2014 Proceedings 259.
    [148]
    Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. 2005. A survey of data provenance in e-science. ACM Sigmod Rec. 34, 3 (2005), 31--36.
    [149]
    Munindar P. Singh and Mladen A. Vouk. 1996. Scientific workflows: Scientific computing meets transactional workflows. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions. 28--34.
    [150]
    Luka Stanisic, Arnaud Legrand, and Vincent Danjean. 2015. An effective git and org-mode based workflow for reproducible research. ACM SIGOPS Oper. Syst. Rev. 49, 1 (2015), 61--70.
    [151]
    Victoria Stodden. 2011. Trust your science? Open your data and code. Amstat News (2011), 21--22.
    [152]
    Victoria Stodden, Jonathan Borwein, and David H. Bailey. 2013. Setting the default to reproducible. Comput. Sci. Res. SIAM News 46 (2013), 4--6.
    [153]
    Victoria Stodden, Friedrich Leisch, and Roger D. Peng. 2014. Implementing Reproducible Research. CRC Press.
    [154]
    Victoria Stodden and Sheila Miguez. 2013. Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Available at SSRN 2322276 (2013).
    [155]
    Sam Sun, Larry Lannom, and Brian Boesch. 2003. Handle System Overview. Technical Report. The Internet Society.
    [156]
    Martin Szomszor and Luc Moreau. 2003. Recording and reasoning over data provenance in web and grid services. In On the Move to Meaningful Internet Syst. 2003: CoopIS, DOA, and ODBASE. Springer, 603--620.
    [157]
    Ian Taylor, Matthew Shields, Ian Wang, and Andrew Harrison. 2007. The triana workflow environment: Architecture and applications. In Workflows for e-Science. Springer, 320--339.
    [158]
    Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. 2014. Workflows for e-Science: Scientific Workflows for Grids. Springer.
    [159]
    Mischa Taylor and Seth Vargo. 2014. Learning Chef: A Guide to Configuration Management and Automation. O’Reilly Media, Inc.
    [160]
    Takayuki Teramoto, Tadashi Okada, and Shigeo Kawata. 2007. A distributed education-support PSE system. In IEEE International Conference on e-Science and Grid Computing. IEEE, 516--520.
    [161]
    Travis CI. 2017. Travis CI—Test and Deploy Your Code with Confidence. Retrieved August 2, 2017, https://travis-ci.org/.
    [162]
    Chris Tucker, David Shuffelton, Ranjit Jhala, and Sorin Lerner. 2007. Opium: Optimal package install/uninstall manager. In Proceedings of the 29th International Conference on Software Engineering. IEEE Computer Society, 178--188.
    [163]
    Matthew J. Turk. 2013. Scaling a code in the human dimension. In Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery. ACM, 69.
    [164]
    Amin Vahdat and Thomas E. Anderson. 1998. Transparent result caching. In USENIX Annual Technical Conference.
    [165]
    Wil Van Der Aalst and Kees Max Van Hee. 2004. Workflow Management: Models, Methods, and Systems. MIT Press.
    [166]
    Wil M. P. Van der Aalst. 1998. The application of petri nets to workflow management. J. Circuits, Syst. Comput. 8, 01 (1998), 21--66.
    [167]
    Wil M. P. Van Der Aalst and Arthur H. M. Ter Hofstede. 2005. YAWL: Yet another workflow language. Inform. Syst. 30, 4 (2005), 245--275.
    [168]
    Sander Van Der Burg, Merijn de Jonge, Eelco Dolstra, and Eelco Visser. 2009. Software deployment in a dynamic cloud: From device to service orientation in a hospital environment. In Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing. IEEE Computer Society, 61--66.
    [169]
    Mayank Varia, Benjamin Price, Nicholas Hwang, Ariel Hamlin, Jonathan Herzog, Jill Poland, Michael Reschly, Sophia Yakoubov, and Robert K. Cunningham. 2015. Automated assessment of secure search systems. ACM SIGOPS Oper. Syst. Rev. 49, 1 (2015), 22--30.
    [170]
    H. M. W. Verbeek, Alexander Hirnschall, and Wil M. P. van der Aalst. 2002. XRL/flower: Supporting inter-organizational workflows using XML/Petri-net technology. In International Workshop on Web Services, E-Business, and the Semantic Web. Springer, 93--108.
    [171]
    Gregor Von Laszewski, Mihael Hategan, and Deepti Kodeboyina. 2007. Java CoG kit workflow. In Workflows for e-Science. Springer, 340--356.
    [172]
    Greg Wilson, Dhavide A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven HD Haddock, Kathryn D. Huff, Ian M. Mitchell, Mark D. Plumbley, et al. 2014. Best practices for scientific computing. PLoS Biol. 12, 1 (2014), e1001745.
    [173]
    Roundtable Participants Yale. 2010. Reproducible research. Compu. Sci. Eng. 12, 5 (2010), 8--13.
    [174]
    Jia Yu and Rajkumar Buyya. 2004. A novel architecture for realizing grid workflow using tuple spaces. In Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing. IEEE, 119--128.
    [175]
    Jia Yu and Rajkumar Buyya. 2005. A taxonomy of workflow management systems for grid computing. J. Grid Computi. 3, 3--4 (2005), 171--200.
    [176]
    Xiang Zhao, Emery R. Boose, Yuriy Brun, Barbara Staudt Lerner, and Leon J. Osterweil. 2013. Supporting undo and redo in scientific data analysis. In TaPP.

    Cited By

    View all
    • (2024)Towards Digital Twin-Oriented Complex Networked Systems: Introducing heterogeneous node features and interaction rulesPLOS ONE10.1371/journal.pone.029642619:1(e0296426)Online publication date: 2-Jan-2024
    • (2024)Reproducibility Debt: Challenges and Future PathwaysCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663778(462-466)Online publication date: 10-Jul-2024
    • (2024)The Challenge of Data Analytics with Climate-neutral Urban Mobility (Vision Paper)ACM Transactions on Spatial Algorithms and Systems10.1145/364931210:2(1-10)Online publication date: 1-Jul-2024
    • Show More Cited By

    Index Terms

    1. Reproducibility in Scientific Computing

                                                                                      Recommendations

                                                                                      Comments

                                                                                      Information & Contributors

                                                                                      Information

                                                                                      Published In

                                                                                      cover image ACM Computing Surveys
                                                                                      ACM Computing Surveys  Volume 51, Issue 3
                                                                                      May 2019
                                                                                      796 pages
                                                                                      ISSN:0360-0300
                                                                                      EISSN:1557-7341
                                                                                      DOI:10.1145/3212709
                                                                                      • Editor:
                                                                                      • Sartaj Sahni
                                                                                      Issue’s Table of Contents
                                                                                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                                                                                      Publisher

                                                                                      Association for Computing Machinery

                                                                                      New York, NY, United States

                                                                                      Publication History

                                                                                      Published: 16 July 2018
                                                                                      Accepted: 01 February 2018
                                                                                      Revised: 01 January 2018
                                                                                      Received: 01 March 2017
                                                                                      Published in CSUR Volume 51, Issue 3

                                                                                      Permissions

                                                                                      Request permissions for this article.

                                                                                      Check for updates

                                                                                      Author Tags

                                                                                      1. Reproducibility
                                                                                      2. computational science
                                                                                      3. replicability
                                                                                      4. reproducible
                                                                                      5. scientific computing
                                                                                      6. scientific workflow
                                                                                      7. scientific workflows
                                                                                      8. workflow
                                                                                      9. workflows

                                                                                      Qualifiers

                                                                                      • Survey
                                                                                      • Research
                                                                                      • Refereed

                                                                                      Contributors

                                                                                      Other Metrics

                                                                                      Bibliometrics & Citations

                                                                                      Bibliometrics

                                                                                      Article Metrics

                                                                                      • Downloads (Last 12 months)145
                                                                                      • Downloads (Last 6 weeks)15
                                                                                      Reflects downloads up to 12 Aug 2024

                                                                                      Other Metrics

                                                                                      Citations

                                                                                      Cited By

                                                                                      View all
                                                                                      • (2024)Towards Digital Twin-Oriented Complex Networked Systems: Introducing heterogeneous node features and interaction rulesPLOS ONE10.1371/journal.pone.029642619:1(e0296426)Online publication date: 2-Jan-2024
                                                                                      • (2024)Reproducibility Debt: Challenges and Future PathwaysCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663778(462-466)Online publication date: 10-Jul-2024
                                                                                      • (2024)The Challenge of Data Analytics with Climate-neutral Urban Mobility (Vision Paper)ACM Transactions on Spatial Algorithms and Systems10.1145/364931210:2(1-10)Online publication date: 1-Jul-2024
                                                                                      • (2024)Evaluating Tools for Enhancing Reproducibility in Computational Scientific ExperimentsProceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663623(46-51)Online publication date: 18-Jun-2024
                                                                                      • (2024)Introduction of the Capsules environment to support further growth of the SBGrid structural biology software collectionActa Crystallographica Section D Structural Biology10.1107/S205979832400488180:6(439-450)Online publication date: 4-Jun-2024
                                                                                      • (2024)Bashing irreproducibility with shournalScientific Reports10.1038/s41598-024-53811-914:1Online publication date: 28-Feb-2024
                                                                                      • (2024)Reproducibility, Replicability and Repeatability: A survey of reproducible research with a focus on high performance computingComputer Science Review10.1016/j.cosrev.2024.10065553(100655)Online publication date: Aug-2024
                                                                                      • (2024)A Topical Review on Container-Based Cloud Revolution: Multi-Directional Challenges, and Future TrendsSN Computer Science10.1007/s42979-024-02763-y5:4Online publication date: 9-Apr-2024
                                                                                      • (2023)Measuring the Concept of PID Literacy: User Perceptions and Understanding of PIDs in Support of Open Scholarly InfrastructureOpen Information Science10.1515/opis-2022-01427:1Online publication date: 14-Mar-2023
                                                                                      • (2023)Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State of the PracticeACM Transactions on Software Engineering and Methodology10.1145/363824333:4(1-50)Online publication date: 21-Dec-2023
                                                                                      • Show More Cited By

                                                                                      View Options

                                                                                      Get Access

                                                                                      Login options

                                                                                      Full Access

                                                                                      View options

                                                                                      PDF

                                                                                      View or Download as a PDF file.

                                                                                      PDF

                                                                                      eReader

                                                                                      View online with eReader.

                                                                                      eReader

                                                                                      Media

                                                                                      Figures

                                                                                      Other

                                                                                      Tables

                                                                                      Share

                                                                                      Share

                                                                                      Share this Publication link

                                                                                      Share on social media