Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2032397.2032415guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Provenance-enabled automatic data publishing

Published: 20 July 2011 Publication History

Abstract

Scientists are increasingly being called upon to publish their data as well as their conclusions. Yet computational science often necessarily occurs in exploratory, unstructured environments. Scientists are as likely to use one-off scripts, legacy programs, and volatile collections of data and parametric assumptions as they are to frame their investigations using easily reproducible workflows. The ES3 system can capture the provenance of such unstructured computations and make it available so that the results of such computations can be evaluated in the overall context of their inputs, implementation, and assumptions. Additionally, we find that such provenance can serve as an automatic "checklist" whereby the suitability of data (or other computational artifacts) for publication can be evaluated. We describe a system that, given the request to publish a particular computational artifact, traverses that artifact's provenance and applies rule-based tests to each of the artifact's computational antecedents to determine whether the artifact's provenance is robust enough to justify its publication. Generically, such tests check for proper curation of the artifacts, which specifically can mean such things as: source code checked into a source control system; data accessible from a well-known repository; etc. Minimally, publish requests yield a report on an object's fitness for publication, although such reports can easily drive an automated cleanup process that remedies many of the identified shortcomings.

References

[1]
Bose, R., Frew, J.: Lineage Retrieval for Scientific Data Processing: A Survey. ACM Computing Surveys 37(1), 1-28 (2005).
[2]
Brandes, U., Eiglsperger, M., Herman, I., Himsolt, M., Marshall, M.S.: GraphML progress report: structural layer proposal. In: Mutzel, P., Jünger, M., Leipert, S. (eds.) GD 2001. LNCS, vol. 2265, pp. 109-112. Springer, Heidelberg (2002).
[3]
Frew, J., Metzger, D., Slaughter, P.: Automatic capture and reconstruction of computational provenance. Concurrency and Computation: Practice and Experience 20, 485-496 (2008).
[4]
Gil, Y., Cheney, J., Groth, P., Hartig, O., Miles, S., Moreau, L., da Silva, P.P.: Provenance XG Final Report. W3C Provenance Incubator Group (2010), http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/
[5]
Guo, P.: CDE: Automatically create portable Linux applications, http://www.stanford.edu/~pgbovine/cde.html
[6]
Moreau, L.: The Foundations for Provenance on the Web. Foundations and Trends in Web Science 2(2-3), 99-241 (2010).
[7]
Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E., Van den Bussche, J.: The Open Provenance Model core specification (v1.1). Future Generation Computer Systems (2010) (in press). 07.005
[8]
Osterweil, E., Zhang, L.: lbsh: Pounding Science into the Command-Line, http://www.cs.ucla.edu/~eoster/doc/lbsh.pdf
[9]
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM SIGMOD Record 34, 31-36 (2005).

Cited By

View all
  • (2019)A Survey on Collecting, Managing, and Analyzing Provenance from ScriptsACM Computing Surveys10.1145/331195552:3(1-38)Online publication date: 18-Jun-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
SSDBM'11: Proceedings of the 23rd international conference on Scientific and statistical database management
July 2011
601 pages
ISBN:9783642223501

Sponsors

  • Paradigm4 Inc.: Paradigm4 Inc.
  • Microsoft Research: Microsoft Research
  • Gordon and Betty Moore Foundation: Gordon and Betty Moore Foundation
  • eScience Institute: eScience Institute

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 20 July 2011

Author Tags

  1. curation
  2. provenance
  3. publishing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)A Survey on Collecting, Managing, and Analyzing Provenance from ScriptsACM Computing Surveys10.1145/331195552:3(1-38)Online publication date: 18-Jun-2019

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media