PROV‐man: A PROV‐compliant toolkit for provenance management
- Published
- Accepted
- Subject Areas
- Data Science, Databases, Digital Libraries, Emerging Technologies
- Keywords
- Provenance, OPM, database design, PROV, ORM, e-science, RDBMS, ER Modeling, Workflow management system, open architecture, Java, Hibernate
- Copyright
- © 2015 Benabdelkader et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. PROV‐man: A PROV‐compliant toolkit for provenance management. PeerJ PrePrints 3:e1102v1 https://doi.org/10.7287/peerj.preprints.1102v1
Abstract
Discoveries in modern science can take years and involve the contribution of large amounts of data, many people and various tools. Although good scientific practice dictates that findings should be reproducible, in practice there are very few automated tools that actually support traceability of the scientific method employed, in particular when various experimental environments are involved at different research phases. Data provenance tracking approaches can play a major role in addressing many of these challenges. These approaches propose ways to capture, manage, and use of provenance information to support the traceability of the scientific methods in heterogeneous environments. PROV is a W3C standard that provides a comprensive model for data and semantics representation with common vocabularies and rich concepts to describe provenance. Nevertheless, it is difficult for domain scientists to easily understand and adopt all the richeness provided by PROV. In this paper we describe the design and implementation of the provenance manager PROV-man, a PROV-compliant framework that facilitates the tasks of scientists in integrating provenance capabilities into their data analysis tools. PROV-man provides functionalities to create and manipulate provenance data in a consistent manner and ensures its permanent storage. It also provides a set of interfaces to serialize and export provenance data into various data formats, serving interoperability. The open architecture of PROV-man, consisting of an API and a configurable database, allows for its easy deployment within existing and newly developed software tools. The paper presents examples illustrating the usage of PROV-man. The first example illustrates how to create and manipulate provenance data of an online newspaper article using PROV-man. The second example demonstrates and evaluates the PROV-man implementation in a more complex case for collection of provenance data about biomedical data analysis activities that are carried out using a distributed computing infrastructure.
Author Comment
This is submission to PeerJ Computer Science for review.