Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2814579.2814580guideproceedingsArticle/Chapter ViewAbstractPublication PagestappConference Proceedingsconference-collections
Article

Retrospective provenance without a runtime provenance recorder

Published: 08 July 2015 Publication History

Abstract

The YesWorkflow (YW) toolkit aims to provide users of scripting languages such as Python, Perl, and R with many of the benefits of scientific workflow automation. YW requires neither the use of a workflow engine nor the overhead of adapting or instrumenting code to run in such a system. Instead, YW enables scientists to annotate their scripts with special comments that reveal the main computational blocks and dataflow dependencies otherwise implicit in scripts. YW tools extract and analyze these comments, represent scripts in terms of entities based on a typical scientific workflow model, and provide graphical workflow views (i.e., prospective provenance) of scripts. In this paper, we present a new extension of YW for inferring retrospective provenance from script executions without relying on a runtime provenance recorder. Instead we exploit the common practice of scientists to embed important pieces of provenance in directory structures and file names. For such "provenance-friendly" data organizations, we offer a new annotation mechanism based on URI templates. YW uses these to link conceptual-level prospective provenance with data files created at runtime, resulting in a powerful, integrated model of prospective and retrospective provenance. We present scientifically meaningful retrospective provenance queries for investigating an execution of a data acquisition workflow implemented as a Python script, and show how these queries can be evaluated using the YW toolkit.

References

[1]
P. Amstutz et al. Common Workflow Language. github.com/common-workflow-language, 2015.
[2]
I. Altintas, O. Barney, and E. Jaeger-Frank. Provenance collection support in the Kepler scientific workflow system. In IPAW, 2006.
[3]
R. K. Bocinsky and T. A. Kohler. A 2,000-Year reconstruction of the rain-fed maize agricultural niche in the US Southwest. Nature Communications, 5, 2014.
[4]
S. Bowers, T. M. McPhillips, and B. Ludäscher. Declarative Rules for Inferring Fine-Grained Data Provenance from Scientific Workflow Execution Traces. In Intl. Provenance and Annotation Workshop (IPAW), pp. 82-96, 2012.
[5]
S. Bowers, T. McPhillips, S. Riddle, M. K. Anand, and B. Ludäscher. Kepler/pPOD: Scientific workflow and provenance support for assembling the tree of life. In IPAW, 2008.
[6]
S. Bowers, T. McPhillips, M. Wu, and B. Ludäscher. Project histories: Managing data provenance across collection-oriented scientific workflow runs. In Data Integration in the Life Sciences (DILS), volume 4544 of LNCS, pp. 122-138. Springer, 2007. Preprint.
[7]
S. Bowers. Scientific workflow, provenance, and data modeling challenges and approaches. Journal on Data Semantics, 1(1):19-30, 2012.
[8]
S. B. Davidson, S. C. Boulakia, A. Eyal, B. Ludäscher, T. M. McPhillips, S. Bowers, M. K. Anand, and J. Freire. Provenance in Scientific Workflow Systems. IEEE Data Eng. Bull., 30(4):44-50, 2007.
[9]
S. Dey, K. Belhajjame, D. Koop, M. Raul, and B. Ludäscher. Linking Prospective and Retrospective Provenance for Scripts. In Intl. Workshop on Theory and Practice of Provenance (TaPP), 2015.
[10]
S. Frazer et al. Workflow Description Language. github.com/broadinstitute/wdl, 2015.
[11]
J. Freire, D. Koop, F. S. Chirigati, and C. T. Silva. Reproducibility using vistrails. Implementing Reproducible Research, page 33, 2014.
[12]
J. Frew, D. Metzger, and P. Slaughter. Automatic capture and reconstruction of computational provenance. Concurrency and Computation: Practice and Experience, 20(5):485-496, 2008.
[13]
C. Gandrud. Reproducible Research with R and R Studio. CRC Press, 2013.
[14]
C. Hedeler, K. Belhajjame, A. A. Fernandes, S. M. Embury, and N. W. Paton. Dimensions of Dataspaces. In Dataspace: The Final Frontier, pp. 55-66. Springer, 2009.
[15]
B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience, 18(10):1039-1065, 2006.
[16]
L. Murta, V. Braganholo, F. Chirigati, D. Koop, and J. Freire. noWorkflow: Capturing and Analyzing Provenance of Scripts. In Intl. Provenance and Annotation Workshop (IPAW), 2014.
[17]
T. McPhillips, S. Bowers, and B. Ludäscher. yesworkflow.org/yw-tapp-15-recon, 2015.
[18]
P. Missier, K. Belhajjame, J. Zhao, M. Roos, and C. Goble. Data lineage model for Taverna workflows with lightweight annotation requirements. In IPAW, 2008.
[19]
T. McPhillips, T. Song, T. Kolisnik, S. Aulenbach, K. Belhajjame, K. Bocinsky, Y. Cao, J. Cheney, F. Chirigati, S. Dey, J. Freire, C. Jones, J. Hanken, K. W. Kintigh, T. A. Kohler, D. Koop, J. A. Macklin, P. Missier, M. Schildhauer, C. Schwalm, Y. Wei, M. Bieda, and B. Ludäscher. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts. International Journal of Digital Curation (IJDC), 10(1):298-313, 2015. Presented at IDCC'15, 30 Euston Square, London, UK.
[20]
V. Stodden, F. Leisch, and R. D. Peng. Implementing reproducible research. CRC Press, 2014.
[21]
Y. Tsai, S. E. McPhillips, A. González, T. M. McPhillips, D. Zinn, A. E. Cohen, M. D. Feese, D. Bushnell, T. Tiefenbrunn, C. D. Stout, et al. AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery. Acta Cryst, 500:69, 2013.
[22]
M. A. Vaz Salles, J.-P. Dittrich, S. K. Karakashian, O. R. Girard, and L. Blunschi. iTrails: pay-as-you-go information integration in dataspaces. In VLDB, pp. 663-674, 2007.
[23]
K. Wolstencroft, R. Haines, D. Fellows, A. Williams, D. Withers, S. Owen, S. Soiland-Reyes, I. Dunlop, A. Nenadic, P. Fisher, et al. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Research, 2013.
[24]
YesWorkflow project site and README. yesworkflow.org/yw, 2015.
[25]
Y. Zhao, M. Wilde, and I. Foster. Applying the virtual data provenance model. In Intl. Provenance and Annotation Workshop (IPAW), 2006.

Cited By

View all
  • (2020)Towards automated, provenance-driven security audit for git-based repositories: applied to germany's corona-warn-app: vision paperProceedings of the 3rd ACM SIGSOFT International Workshop on Software Security from Design to Deployment10.1145/3416507.3423190(15-18)Online publication date: 9-Nov-2020
  • (2019)A Survey on Collecting, Managing, and Analyzing Provenance from ScriptsACM Computing Surveys10.1145/331195552:3(1-38)Online publication date: 18-Jun-2019
  • (2018)Mechanisms for provenance collection in scientific workflow systemsComputing10.1007/s00607-017-0578-1100:5(439-472)Online publication date: 1-May-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
TaPP'15: Proceedings of the 7th USENIX Conference on Theory and Practice of Provenance
July 2015
14 pages

Publisher

USENIX Association

United States

Publication History

Published: 08 July 2015

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 10 of 17 submissions, 59%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Towards automated, provenance-driven security audit for git-based repositories: applied to germany's corona-warn-app: vision paperProceedings of the 3rd ACM SIGSOFT International Workshop on Software Security from Design to Deployment10.1145/3416507.3423190(15-18)Online publication date: 9-Nov-2020
  • (2019)A Survey on Collecting, Managing, and Analyzing Provenance from ScriptsACM Computing Surveys10.1145/331195552:3(1-38)Online publication date: 18-Jun-2019
  • (2018)Mechanisms for provenance collection in scientific workflow systemsComputing10.1007/s00607-017-0578-1100:5(439-472)Online publication date: 1-May-2018
  • (2016)Addressing Scientific Rigor in Data Analytics Using Semantic WorkflowsProceedings of the 6th International Workshop on Provenance and Annotation of Data and Processes - Volume 967210.5555/3090188.3090211(187-190)Online publication date: 7-Jun-2016
  • (2016)Yin & YangProceedings of the 6th International Workshop on Provenance and Annotation of Data and Processes - Volume 967210.5555/3090188.3090205(161-165)Online publication date: 7-Jun-2016

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media