Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An introduction to Docker for reproducible research

Published: 20 January 2015 Publication History

Abstract

As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a 'DevOps' philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.

References

[1]
Altintas, I. et al. 2004. Kepler: an extensible system for design and execution of scientific workflows. Proceedings.16th international conference on scientific and statistical database management, 2004. (2004).
[2]
Barnes, N. 2010. Publish your computer code: it is good enough. Nature. 467, 7317 (Oct. 2010), 753--753.
[3]
Clark, D. et al. 2014. BCE: Berkeley's Common Scientific Compute Environment for Research and Education. Proceedings of the 13th Python in Science Conference (SciPy 2014). (2014).
[4]
Collberg, C. et al. 2014. Measuring Reproducibility in Computer Systems Research.
[5]
Dudley, J.T. and Butte, A.J. 2010. In silico research in the era of cloud computing. Nat Biotechnol. 28, 11 (Nov. 2010), 1181--1185.
[6]
Eide, E. 2010. Toward Replayable Research in Networking and Systems. Archive '10, the nSF workshop on archiving experiments to raise scientific standards (2010).
[7]
FitzJohn, R. et al. 2014. Reproducible research is still a challenge. http://ropensci.org/blog/2014/06/09/reproducibility/.
[8]
Garijo, D. et al. 2013. Quantifying reproducibility in computational biology: The case of the tuberculosis drugome. {PLoS} {ONE}. 8, 11 (Nov. 2013), e80278.
[9]
Gil, Y. et al. 2007. Examining the challenges of scientific workflows. Computer. 40, 12 (2007), 24--32.
[10]
Gilbert, K.J. et al. 2012. Recommendations for utilizing and reporting population genetic analyses: the reproducibilityof genetic clustering using the program structure. Mol Ecol. 21, 20 (Sep. 2012), 4925--4930.
[11]
Harji, A.S. et al. 2013. Our Troubles with Linux Kernel Upgrades and Why You Should Care. ACM SIGOPS Operating Systems Review. 47, 2 (2013), 66--72.
[12]
Howe, B. 2012. Virtual appliances, cloud computing, and reproducible research. Computing in Science & Engineering. 14, 4 (Jul. 2012), 36--41.
[13]
Hull, D. et al. 2006. Taverna: a tool for building and running workflows of services. Nucleic Acids Research. 34, Web Server (Jul. 2006), W729--W732.
[14]
Ince, D.C. et al. 2012. The case for open computer programs. Nature. 482, 7386 (Feb. 2012), 485--488.
[15]
Joppa, L.N. et al. 2013. Troubling Trends in Scientific Software Use. Science (New York, N.Y.). 340, 6134 (May 2013), 814--815.
[16]
Lapp, Hilmar 2014. Reproducibility / repeatability big- Think (with tweets) @hlapp. Storify. http://storify.com/hlapp/reproducibility-repeatability-bigthink.
[17]
Leisch, F. 2002. Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis. Compstat. W. Härdle and B. Rönz, eds. Physica-Verlag HD.
[18]
Merali, Z. 2010. Computational science: ...Error. Nature. 467, 7317 (Oct. 2010), 775--777.
[19]
Nature Editors 2012. Must try harder. Nature. 483, 7391 (Mar. 2012), 509--509.
[20]
Ooms, J. 2013. Possible directions for improving dependency versioning in r. arXiv.org. http://arxiv.org/abs/1303. 2140v2.
[21]
Ooms, J. 2014. The openCPU system: Towards a universal interface for scientific computing through separation of concerns. arXiv.org. http://arxiv.org/abs/1406.4806.
[22]
Peng, R.D. 2011. Reproducible research in computational science. Science. 334, 6060 (Dec. 2011), 1226--1227.
[23]
Stodden, V. 2010. The scientific method in practice: Reproducibility in the computational sciences. SSRN Journal. (2010).
[24]
Stodden, V. et al. 2013. Setting the Default to Reproducible. (2013), 1--19.
[25]
The Economist 2013. How science goes wrong. The Economist. http://www.economist.com/news/leaders/21588069-scientific-research-has-changed-world-now-itneeds-change-itself-how-science-goes-wrong.
[26]
Xie, Y. 2013. Dynamic documents with R and knitr. Chapman; Hall/CRC.
[27]
2014. Examining reproducibility in computer science. http://cs.brown.edu/~sk/Memos/Examining- Reproducibility/.
[28]
2012. Mick Watson on Twitter: @ewanbirney @pathogenomenick @ctitusbrown you can't install an image for every pipeline you want... https://twitter.com/BioMickWatson/status/265037994526928896.

Cited By

View all
  • (2025)Container image management in cloud-edge environments: an image deletion method based on layer affinityFourth International Conference on Computer Vision, Application, and Algorithm (CVAA 2024)10.1117/12.3055920(138)Online publication date: 9-Jan-2025
  • (2025)The imperative for reproducibility in building performance simulation researchJournal of Building Performance Simulation10.1080/19401493.2024.2441385(1-7)Online publication date: 8-Jan-2025
  • (2025)Characterising reproducibility debt in scientific software: A systematic literature reviewJournal of Systems and Software10.1016/j.jss.2024.112327222(112327)Online publication date: Apr-2025
  • Show More Cited By

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 49, Issue 1
Special Issue on Repeatability and Sharing of Experimental Artifacts
January 2015
155 pages
ISSN:0163-5980
DOI:10.1145/2723872
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2015
Published in SIGOPS Volume 49, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)558
  • Downloads (Last 6 weeks)37
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Container image management in cloud-edge environments: an image deletion method based on layer affinityFourth International Conference on Computer Vision, Application, and Algorithm (CVAA 2024)10.1117/12.3055920(138)Online publication date: 9-Jan-2025
  • (2025)The imperative for reproducibility in building performance simulation researchJournal of Building Performance Simulation10.1080/19401493.2024.2441385(1-7)Online publication date: 8-Jan-2025
  • (2025)Characterising reproducibility debt in scientific software: A systematic literature reviewJournal of Systems and Software10.1016/j.jss.2024.112327222(112327)Online publication date: Apr-2025
  • (2025)SCASS: Breaking into SCADA Systems SecurityComputers & Security10.1016/j.cose.2025.104315151(104315)Online publication date: Apr-2025
  • (2025)Practical Application of Large‐Scale BlockchainPrinciples and Applications of Blockchain Systems10.1002/9781394237258.ch9(347-376)Online publication date: 3-Jan-2025
  • (2024)Diseño e implementación de una aplicación web para la administración de servicios dockerizados sobre el proxy inverso del servidor de investigaciones de la Facultad de Ingenierías de la Institución Universitaria Antonio José Camacho. “Dockerwizard”Revista Sapientía10.54278/sapientia.v16i31.17216:31Online publication date: 30-Jan-2024
  • (2024)Transforming Hydrology Python Packages into Web Application Programming Interfaces: A Comprehensive Workflow Using Modern Web TechnologiesWater10.3390/w1618260916:18(2609)Online publication date: 14-Sep-2024
  • (2024)Enhancing Monitoring Performance: A Microservices Approach to Monitoring with Spyware Techniques and Prediction ModelsSensors10.3390/s2413421224:13(4212)Online publication date: 28-Jun-2024
  • (2024)AI-Enhanced Blockchain for Scalable IoT-Based Supply ChainLogistics10.3390/logistics80401098:4(109)Online publication date: 4-Nov-2024
  • (2024)Simplifying Land Cover-Geoprocessing-Model Migration with a PAMC-LC Containerization Strategy in the Open Web EnvironmentISPRS International Journal of Geo-Information10.3390/ijgi1306018713:6(187)Online publication date: 3-Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media