PROV‐man: A PROV‐compliant toolkit for provenance management

Ammar Benabdelkader; Antoine A.H.C. van Kampen; Silvia D Olabarriaga

doi:10.7287/peerj.preprints.1102v1

PROV‐man: A PROV‐compliant toolkit for provenance management

Ammar Benabdelkader ¹, Antoine A.H.C. van Kampen², Silvia D Olabarriaga²

1 Sharp Systems, Amstelveen, The Netherlands

2 Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands

DOI: 10.7287/peerj.preprints.1102v1

Published: 2015-05-21
Accepted: 2015-05-20

Subject Areas: Data Science, Databases, Digital Libraries, Emerging Technologies
Keywords: Provenance, OPM, database design, PROV, ORM, e-science, RDBMS, ER Modeling, Workflow management system, open architecture, Java, Hibernate

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Benabdelkader A, van Kampen AAHC, Olabarriaga SD. 2015. PROV‐man: A PROV‐compliant toolkit for provenance management. PeerJ PrePrints 3:e1102v1 https://doi.org/10.7287/peerj.preprints.1102v1

Abstract

Discoveries in modern science can take years and involve the contribution of large amounts of data, many people and various tools. Although good scientific practice dictates that findings should be reproducible, in practice there are very few automated tools that actually support traceability of the scientific method employed, in particular when various experimental environments are involved at different research phases. Data provenance tracking approaches can play a major role in addressing many of these challenges. These approaches propose ways to capture, manage, and use of provenance information to support the traceability of the scientific methods in heterogeneous environments. PROV is a W3C standard that provides a comprensive model for data and semantics representation with common vocabularies and rich concepts to describe provenance. Nevertheless, it is difficult for domain scientists to easily understand and adopt all the richeness provided by PROV. In this paper we describe the design and implementation of the provenance manager PROV-man, a PROV-compliant framework that facilitates the tasks of scientists in integrating provenance capabilities into their data analysis tools. PROV-man provides functionalities to create and manipulate provenance data in a consistent manner and ensures its permanent storage. It also provides a set of interfaces to serialize and export provenance data into various data formats, serving interoperability. The open architecture of PROV-man, consisting of an API and a configurable database, allows for its easy deployment within existing and newly developed software tools. The paper presents examples illustrating the usage of PROV-man. The first example illustrates how to create and manipulate provenance data of an online newspaper article using PROV-man. The second example demonstrates and evaluates the PROV-man implementation in a more complex case for collection of provenance data about biomedical data analysis activities that are carried out using a distributed computing infrastructure.

Author Comment

This is submission to PeerJ Computer Science for review.

0

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article