Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Methods included: standardizing computational reuse and portability with the Common Workflow Language

Published: 20 May 2022 Publication History

Abstract

Standardizing computational reuse and portability with the Common Workflow Language.

Supplementary Material

PDF File (p54-crusoe-supp.pdf)
Supplementary material.

References

[1]
Afgan, E. et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Research 46, W1 (July 2018), W537--W544.
[2]
Babuji, Y. et al. Parsl: Pervasive parallel programming in Python. In Proceedings of the 28th Intern. Symp. on High-Performance Parallel and Distributed Computing. Association for Computing Machinery (2019), 25--36.
[3]
Belhajjame, K. et al. Using a suite of ontologies for preserving workflow-centric research objects. J. of Web Semantics 32 (May 2015), 16--42.
[4]
Bell, T. et al. Web-based Analysis Services Report. Technical Report CERN-IT-Note-2018-004. (2017), CERN, Geneva, Switzerland. http://cds.cern.ch/record/2315331/.
[5]
Berthold, M.R et al. KNIME-The Konstanz information miner: Version 2.0 and beyond. ACM SIGKDD Explorations Newsletter 11, 1 (Nov. 2009), 26--31.
[6]
Colonnelli, I. et al. StreamFlow: Cross-breeding cloud with HPC. IEEE Transactions on Emerging Topics in Computing (2020), 1--1.
[7]
Couvares, P. et al. Workflow management in Condor. In Workflows for e-Science: Scientific Workflows for Grids, I.J. Taylor, E. Deelman, D.B. Gannon, and M. Shields (Eds.). Springer, London (2007), 357--375.
[8]
Cuevas-Vicenttín, C. et al. Scientific workflows and provenance: Introduction and research opportunities. Datenbank-Spektrum 12, 3 (Nov. 2012), 193--203.
[9]
de la Garza, L. et al. From the desktop to the grid: Scalable bioinformatics via workflow conversion. BMC Bioinformatics 17, 1 (March 2016), 127.
[10]
Deelman, E. et al. Pegasus, a workflow management system for science automation. Future Generation Computer Systems 46 (May 2015), 17--35.
[11]
Feitelson, D.G. From repeatability to reproducibility and corroboration. ACM SIGOPS Operating Systems Review 49, 1 (Jan. 2015), 3--11.
[12]
Georgeson, P. et al. Bionitio: Demonstrating and facilitating best practices for bioinformatics command-line software. GigaScience 8, giz109 (Sept. 2019).
[13]
Gonçalves, P. OGC Earth observations applications pilot: Terradue engineering report. OGC Public Engineering Report OGC 20-042. Open Geospatial Consortium. http://docs.opengeospatial.org/per/20-042.html.
[14]
Gryk, M.R. and Ludäscher, B. Workflows and provenance: Toward information science solutions for the natural sciences. Library Trends 65, 4 (2017), 555--562.
[15]
Guarracino, A. et al. COVID-19 PubSeq: Public SARS-CoV-2 sequence resource. Bioinformatics Open Source Conference (July 2020), https://sched.co/coLw.
[16]
IEEE standard for bioinformatics analyses generated by high-throughput sequencing (HTS) to facilitate communication. (May 11, 2020)
[17]
Ivie, P. and Thain, D. Reproducibility in scientific computing. ACM Computing Surveys 51, 3 (July 2018), 63:1--63:36.
[18]
Jiang, F. Castillo, C., and Ahalt, S. TR-19-01: A cloud-agnostic framework for geo-distributed data-intensive applications. RENCI, University of North Carolina at Chapel Hill, (2019), https://renci.org/technical-reports/tr-19-01/.
[19]
Jiang, F., Ferriter, K., and Castillo, C. PIVOT: Cost-aware scheduling of data-intensive applications in a cloud-agnostic system. RENCI, University of North Carolina at Chapel Hill, (2019). https://renci.org/technical-reports/tr-19-02/.
[20]
Khan, F.Z. et al. Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv. GigaScience 8, 11 (November 2019), giz095.
[21]
Kotliar, M., Kartashov, A.V., and Barski, A. CWL-Airflow: A lightweight pipeline manager supporting Common Workflow Language. GigaScience 8, 7 (July 2019), giz095.
[22]
Kunze, J., Littman, J., Madden, E., Scancella, J., and Adams, C. The BagIt file packaging format (V1.0). (October 2018), https://www.rfc-editor.org/info/rfc8493.
[23]
Landry, T. OGC Earth observation applications pilot: CRIM engineering report. Open Geospatial Consortium Public Engineering Report 20-045 (2020), http://docs.opengeospatial.org/per/20-045.html
[24]
Lau, J.W. et al. The Cancer Genomics Cloud: Collaborative, reproducible, and democratized---A new paradigm in large-scale computational research. Cancer Research 77, 21 (Oct. 2017), e3--e6.
[25]
Lee, J-H., Yi, H., and Chun, J. rRNASelector: A computer program for selecting ribosomal RNA encoding sequences from metagenomic and metatranscriptomic shotgun libraries. The J. of Microbiology 49, 4 (September 2011), 689.
[26]
Missier, P., Belhajjame, K., and Cheney, J. The W3C PROV family of specifications for modelling provenance metadata. In Proceedings of the 16th Intern. Conf. on Extending Database Technology. Association for Computing Machinery (2013).
[27]
Mitchell, A-L. MGnify: The microbiome analysis resource in 2020. Nucleic Acids Research 48, D1 (January 2020), D570--D578.
[28]
Oliver, H. Workflow automation for cycling systems: The Cylc Workflow Engine. Computing in Science Engineering (2019), 1--1.
[29]
Perkel, J.M. Workflow systems turn raw data into scientific knowledge. Nature 573 (September 2019), 149--150.
[30]
POSIX.1-2008: IEEE Std 1003.1-2008 and The Open Group Technical Standard Base Specifications, Issue 7. IEEE and The Open Group, https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/.
[31]
Seemann, T. Ten recommendations for creating usable bioinformatics command line software. GigaScience 2, 2047-217X-2-15 (December 2013).
[32]
Simonis, I. OGC Earth observation applications pilot: Summary engineering report. Open Geospatial Consortium Public Engineering Report OGC 20-073 (2020), https://docs.ogc.org/per/20-073.html.
[33]
Taylor, R.C. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics 11, 12 (December 2010), S1.
[34]
van Wezenbeek, W.J.S.M., Touwen, H.J.J., Versteeg, A.M.C., and van Wesenbeeck, A.J.M. National Open Science Plan. Ministry of Education, Culture, and Science, Netherlands, (2017).
[35]
Vivian, J. Toil enables reproducible, open source, big biomedical data analyses. Nature Biotechnology 35, 4 (April 2017), 314--316.

Cited By

View all
  • (2024)Dataspace Integration for Agrobiodiversity Digital Twins with RO-CrateBiodiversity Information Science and Standards10.3897/biss.8.1344798Online publication date: 12-Aug-2024
  • (2024)Implementation of FAIR Practices in Computational Metabolomics Workflows—A Case StudyMetabolites10.3390/metabo1402011814:2(118)Online publication date: 10-Feb-2024
  • (2024)The Pos Experiment Controller: Reproducible & Portable Network Experiments2024 19th Wireless On-Demand Network Systems and Services Conference (WONS)10.23919/WONS60642.2024.10449532(85-92)Online publication date: 29-Jan-2024
  • Show More Cited By

Index Terms

  1. Methods included: standardizing computational reuse and portability with the Common Workflow Language

                                Recommendations

                                Comments

                                Information & Contributors

                                Information

                                Published In

                                cover image Communications of the ACM
                                Communications of the ACM  Volume 65, Issue 6
                                June 2022
                                98 pages
                                ISSN:0001-0782
                                EISSN:1557-7317
                                DOI:10.1145/3538687
                                Issue’s Table of Contents
                                This work is licensed under a Creative Commons Attribution International 4.0 License.

                                Publisher

                                Association for Computing Machinery

                                New York, NY, United States

                                Publication History

                                Published: 20 May 2022
                                Published in CACM Volume 65, Issue 6

                                Check for updates

                                Qualifiers

                                • Research-article
                                • Popular
                                • Refereed

                                Funding Sources

                                Contributors

                                Other Metrics

                                Bibliometrics & Citations

                                Bibliometrics

                                Article Metrics

                                • Downloads (Last 12 months)1,258
                                • Downloads (Last 6 weeks)128
                                Reflects downloads up to 16 Oct 2024

                                Other Metrics

                                Citations

                                Cited By

                                View all
                                • (2024)Dataspace Integration for Agrobiodiversity Digital Twins with RO-CrateBiodiversity Information Science and Standards10.3897/biss.8.1344798Online publication date: 12-Aug-2024
                                • (2024)Implementation of FAIR Practices in Computational Metabolomics Workflows—A Case StudyMetabolites10.3390/metabo1402011814:2(118)Online publication date: 10-Feb-2024
                                • (2024)The Pos Experiment Controller: Reproducible & Portable Network Experiments2024 19th Wireless On-Demand Network Systems and Services Conference (WONS)10.23919/WONS60642.2024.10449532(85-92)Online publication date: 29-Jan-2024
                                • (2024)Viash: A meta-framework for building reusable workflow modulesJournal of Open Source Software10.21105/joss.060899:93(6089)Online publication date: Jan-2024
                                • (2024)Recording provenance of workflow runs with RO-CratePLOS ONE10.1371/journal.pone.030921019:9(e0309210)Online publication date: 10-Sep-2024
                                • (2024)Differences in the human gut microbiota with varying depressive symptom severity scoresBioscience of Microbiota, Food and Health10.12938/bmfh.2023-04943:4(336-341)Online publication date: 2024
                                • (2024)From the establishment of a national bioinformatics society to the development of a national bioinformatics infrastructureF1000Research10.12688/f1000research.153895.113(1002)Online publication date: 3-Sep-2024
                                • (2024)Establishing the ELIXIR Microbiome CommunityF1000Research10.12688/f1000research.144515.113(50)Online publication date: 8-Jan-2024
                                • (2024)ReUseData: an R/Bioconductor tool for reusable and reproducible genomic data managementBMC Bioinformatics10.1186/s12859-023-05626-025:1Online publication date: 3-Jan-2024
                                • (2024)Mass Spectrometry–Based Proteogenomics: New Therapeutic Opportunities for Precision MedicineAnnual Review of Pharmacology and Toxicology10.1146/annurev-pharmtox-022723-11392164:1(455-479)Online publication date: 23-Jan-2024
                                • Show More Cited By

                                View Options

                                View options

                                PDF

                                View or Download as a PDF file.

                                PDF

                                eReader

                                View online with eReader.

                                eReader

                                Digital Edition

                                View this article in digital edition.

                                Digital Edition

                                Magazine Site

                                View this article on the magazine site (external)

                                Magazine Site

                                Get Access

                                Login options

                                Full Access

                                Media

                                Figures

                                Other

                                Tables

                                Share

                                Share

                                Share this Publication link

                                Share on social media