Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An introduction to Docker for reproducible research

Published: 20 January 2015 Publication History

Abstract

As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a 'DevOps' philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.

References

[1]
Altintas, I. et al. 2004. Kepler: an extensible system for design and execution of scientific workflows. Proceedings.16th international conference on scientific and statistical database management, 2004. (2004).
[2]
Barnes, N. 2010. Publish your computer code: it is good enough. Nature. 467, 7317 (Oct. 2010), 753--753.
[3]
Clark, D. et al. 2014. BCE: Berkeley's Common Scientific Compute Environment for Research and Education. Proceedings of the 13th Python in Science Conference (SciPy 2014). (2014).
[4]
Collberg, C. et al. 2014. Measuring Reproducibility in Computer Systems Research.
[5]
Dudley, J.T. and Butte, A.J. 2010. In silico research in the era of cloud computing. Nat Biotechnol. 28, 11 (Nov. 2010), 1181--1185.
[6]
Eide, E. 2010. Toward Replayable Research in Networking and Systems. Archive '10, the nSF workshop on archiving experiments to raise scientific standards (2010).
[7]
FitzJohn, R. et al. 2014. Reproducible research is still a challenge. http://ropensci.org/blog/2014/06/09/reproducibility/.
[8]
Garijo, D. et al. 2013. Quantifying reproducibility in computational biology: The case of the tuberculosis drugome. {PLoS} {ONE}. 8, 11 (Nov. 2013), e80278.
[9]
Gil, Y. et al. 2007. Examining the challenges of scientific workflows. Computer. 40, 12 (2007), 24--32.
[10]
Gilbert, K.J. et al. 2012. Recommendations for utilizing and reporting population genetic analyses: the reproducibilityof genetic clustering using the program structure. Mol Ecol. 21, 20 (Sep. 2012), 4925--4930.
[11]
Harji, A.S. et al. 2013. Our Troubles with Linux Kernel Upgrades and Why You Should Care. ACM SIGOPS Operating Systems Review. 47, 2 (2013), 66--72.
[12]
Howe, B. 2012. Virtual appliances, cloud computing, and reproducible research. Computing in Science & Engineering. 14, 4 (Jul. 2012), 36--41.
[13]
Hull, D. et al. 2006. Taverna: a tool for building and running workflows of services. Nucleic Acids Research. 34, Web Server (Jul. 2006), W729--W732.
[14]
Ince, D.C. et al. 2012. The case for open computer programs. Nature. 482, 7386 (Feb. 2012), 485--488.
[15]
Joppa, L.N. et al. 2013. Troubling Trends in Scientific Software Use. Science (New York, N.Y.). 340, 6134 (May 2013), 814--815.
[16]
Lapp, Hilmar 2014. Reproducibility / repeatability big- Think (with tweets) @hlapp. Storify. http://storify.com/hlapp/reproducibility-repeatability-bigthink.
[17]
Leisch, F. 2002. Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis. Compstat. W. Härdle and B. Rönz, eds. Physica-Verlag HD.
[18]
Merali, Z. 2010. Computational science: ...Error. Nature. 467, 7317 (Oct. 2010), 775--777.
[19]
Nature Editors 2012. Must try harder. Nature. 483, 7391 (Mar. 2012), 509--509.
[20]
Ooms, J. 2013. Possible directions for improving dependency versioning in r. arXiv.org. http://arxiv.org/abs/1303. 2140v2.
[21]
Ooms, J. 2014. The openCPU system: Towards a universal interface for scientific computing through separation of concerns. arXiv.org. http://arxiv.org/abs/1406.4806.
[22]
Peng, R.D. 2011. Reproducible research in computational science. Science. 334, 6060 (Dec. 2011), 1226--1227.
[23]
Stodden, V. 2010. The scientific method in practice: Reproducibility in the computational sciences. SSRN Journal. (2010).
[24]
Stodden, V. et al. 2013. Setting the Default to Reproducible. (2013), 1--19.
[25]
The Economist 2013. How science goes wrong. The Economist. http://www.economist.com/news/leaders/21588069-scientific-research-has-changed-world-now-itneeds-change-itself-how-science-goes-wrong.
[26]
Xie, Y. 2013. Dynamic documents with R and knitr. Chapman; Hall/CRC.
[27]
2014. Examining reproducibility in computer science. http://cs.brown.edu/~sk/Memos/Examining- Reproducibility/.
[28]
2012. Mick Watson on Twitter: @ewanbirney @pathogenomenick @ctitusbrown you can't install an image for every pipeline you want... https://twitter.com/BioMickWatson/status/265037994526928896.

Cited By

View all
  • (2024)Diseño e implementación de una aplicación web para la administración de servicios dockerizados sobre el proxy inverso del servidor de investigaciones de la Facultad de Ingenierías de la Institución Universitaria Antonio José Camacho. “Dockerwizard”Revista Sapientía10.54278/sapientia.v16i31.17216:31Online publication date: 30-Jan-2024
  • (2024)Enhancing Monitoring Performance: A Microservices Approach to Monitoring with Spyware Techniques and Prediction ModelsSensors10.3390/s2413421224:13(4212)Online publication date: 28-Jun-2024
  • (2024)Simplifying Land Cover-Geoprocessing-Model Migration with a PAMC-LC Containerization Strategy in the Open Web EnvironmentISPRS International Journal of Geo-Information10.3390/ijgi1306018713:6(187)Online publication date: 3-Jun-2024
  • Show More Cited By

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 49, Issue 1
Special Issue on Repeatability and Sharing of Experimental Artifacts
January 2015
155 pages
ISSN:0163-5980
DOI:10.1145/2723872
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2015
Published in SIGOPS Volume 49, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)598
  • Downloads (Last 6 weeks)55
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Diseño e implementación de una aplicación web para la administración de servicios dockerizados sobre el proxy inverso del servidor de investigaciones de la Facultad de Ingenierías de la Institución Universitaria Antonio José Camacho. “Dockerwizard”Revista Sapientía10.54278/sapientia.v16i31.17216:31Online publication date: 30-Jan-2024
  • (2024)Enhancing Monitoring Performance: A Microservices Approach to Monitoring with Spyware Techniques and Prediction ModelsSensors10.3390/s2413421224:13(4212)Online publication date: 28-Jun-2024
  • (2024)Simplifying Land Cover-Geoprocessing-Model Migration with a PAMC-LC Containerization Strategy in the Open Web EnvironmentISPRS International Journal of Geo-Information10.3390/ijgi1306018713:6(187)Online publication date: 3-Jun-2024
  • (2024)Bridging the Gap between Project-Oriented and Exercise-Oriented Automatic Assessment ToolsComputers10.3390/computers1307016213:7(162)Online publication date: 30-Jun-2024
  • (2024)The eDNA-Container App: A Simple-to-Use Cross-Platform Package for the Reproducible Analysis of eDNA Sequencing DataApplied Sciences10.3390/app1406264114:6(2641)Online publication date: 21-Mar-2024
  • (2024)Microbiome modeling: a beginner's guideFrontiers in Microbiology10.3389/fmicb.2024.136837715Online publication date: 19-Jun-2024
  • (2024)From Model Performance to Claim: How a Change of Focus in Machine Learning Replicability Can Help Bridge the Responsibility GapSSRN Electronic Journal10.2139/ssrn.4806609Online publication date: 2024
  • (2024)gcamreport: An R tool to process and standardize GCAM outputsJournal of Open Source Software10.21105/joss.059759:96(5975)Online publication date: Apr-2024
  • (2024)A Software Tool for Hybrid Earthquake Forecasting in New ZealandSeismological Research Letters10.1785/0220240196Online publication date: 26-Jul-2024
  • (2024)BASiCS workflow: a step-by-step analysis of expression variability using single cell RNA sequencing dataF1000Research10.12688/f1000research.74416.211(59)Online publication date: 7-May-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media