Scientific workflows are becoming increasingly popular for compute-intensive and data-intensive scientific applications. The vision and promise of scientific workflows includes rapid, easy workflow design, reuse, scalable execution, and other advantages, e.g., to facilitate “reproducible science” through provenance (e.g., data lineage) support. However, as described in the paper, important research challenges remain. While the database community has studied (business) workflow technologies extensively in the past, most current work in scientific workflows seems to be done outside of the database community, e.g., by practitioners and researchers in the computational sciences and eScience. We provide a brief introduction to scientific workflows and provenance, and identify areas and problems that suggest new opportunities for database research.
Here we ignore a number of details, e.g., actor ports, subworkflows “hidden” within so-called composite actors, etc.
Similarly, in business process modeling, more abstract models, e.g., BPMN, and simple, structured models (e.g., series-parallel graphs) can be easier to understand and reuse than unstructured or lower-level models, e.g., Petri nets.
A physical shim is a thin strip of metal for aligning pipes.
This shim actor turns a data array token into a sequence of individual data tokens.
As of July 2012; see http://www.myexperiment.org.
See, for example, Amazon’s Simple Storage Service (S3) http://aws.amazon.com and Simple Workflow Service (SWS).
Work supported in part by NSF awards OCI-0830944, OCI-0722079, DGE-0841297, and DBI-0960535.
