Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2452376.2452478acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
tutorial

The W3C PROV family of specifications for modelling provenance metadata

Published: 18 March 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Provenance, a form of structured metadata designed to record the origin or source of information, can be instrumental in deciding whether information is to be trusted, how it can be integrated with other diverse information sources, and how to establish attribution of information to authors throughout its history. The PROV set of specifications, produced by the World Wide Web Consortium (W3C), is designed to promote the publication of provenance information on the Web, and offers a basis for interoperability across diverse provenance management systems. The PROV provenance model is deliberately generic and domain-agnostic, but extension mechanisms are available and can be exploited for modelling specific domains. This tutorial provides an account of these specifications. Starting from intuitive and informal examples that present idiomatic provenance patterns, it progressively introduces the relational model of provenance along with the constraints model for validation of provenance documents, and concludes with example applications that show the extension points in use.

    References

    [1]
    P. Agrawal, O. Benjelloun, et al. Trio: a system for data, uncertainty, and lineage. In Proceedings of the 32nd international conference on Very large data bases, VLDB '06, pages 1151--1154. VLDB Endowment, 2006.
    [2]
    K. Belhajjame et al. Workflow-centric research objects: First class citizens in scholarly discourse. In Proceedings of Sepublica 2012, pages 1--12, Hersonissos, 2012.
    [3]
    P. Buneman, S. Khanna, and W. C. Tan. Why and Where: A Characterization of Data Provenance. In ICDT, pages 316--330, 2001.
    [4]
    I. Celino, S. Contessa, M. Corubolo, et al. Linking smart cities datasets with human computation - the case of UrbanMatch. In P. Cudré-Mauroux et al., editors, ISWC, volume 7650 of Lecture Notes in Computer Science, pages 34--49. Springer, 2012.
    [5]
    J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, 1:379--474, 2009.
    [6]
    J. Cheney, A. Finkelstein, B. Ludaescher, and S. Vansummeren. Principles of Provenance (Dagstuhl Seminar 12091). Dagstuhl Reports, 2(2):84--113, 2012.
    [7]
    L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. DBNotes: a post-it system for relational databases based on provenance. In SIGMOD, 2005.
    [8]
    V. Cuevas-Vicenttin, S. Dey, and B. Ludaescher. Modeling and querying scientific workflow provenance in the D-OPM. In WORKS. ACM, 2012.
    [9]
    E. Deelman, D. Gannon, M. S. Shields, and I. Taylor. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Comp. Syst., 25(5):528--540, 2009.
    [10]
    M. Ebden, T. D. Huynh, L. Moreau, et al. Network analysis on provenance graphs from a crowdsourcing application. In Groth and Frew {12}, pages 168--182.
    [11]
    R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005.
    [12]
    P. T. Groth and J. Frew, editors. 4th International Provenance and Annotation Workshop, IPAW 2012, Santa Barbara, CA, USA, June 19--21, 2012, volume 7525 of Lecture Notes in Computer Science. Springer, 2012.
    [13]
    G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD Conference, 2010.
    [14]
    P. Missier and K. Belhajjame. A PROV encoding for provenance analysis using deductive rules. In Procs. IPAW'12, Santa Barbara, California, 2012. Springer-Verlag, Lecture Notes in Computer Science.
    [15]
    P. Missier, S. Soiland-Reyes, S. Owen, et al. Taverna, reloaded. In Procs. SSDBM 2010, volume 6187 of Lecture Notes in Computer Science, pages 471--481, Heidelberg, Germany, 2010. Springer.
    [16]
    H. Yang, D. T. Michaelides, C. Charlton, et al. Deep: A provenance-aware executable document system. In Groth and Frew {12}, pages 24--38.

    Cited By

    View all
    • (2024)DLProv: A Data-Centric Support for Deep Learning Workflow AnalysesProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663337(77-85)Online publication date: 9-Jun-2024
    • (2024)PROV-IO: A Cross-Platform Provenance Framework for Scientific Data on HPC SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.337455535:5(844-861)Online publication date: May-2024
    • (2024)Managing Provenance Data in Knowledge Graph Management PlatformsDatenbank-Spektrum10.1007/s13222-023-00463-024:1(43-52)Online publication date: 5-Feb-2024
    • Show More Cited By

    Recommendations

    Reviews

    Yingjie Li

    Provenance is information about entities, activities, and people involved in producing a piece of data or a thing, which can be used to assess its quality, reliability or trustworthiness. This paper focuses on a new approach using the standard PROV model recommended by the World Wide Web Consortium (W3C) to model provenance. The W3C PROV model defines a core model for provenance representation. Individuals involved in the semantic web, provenance, and ontology field will want to study this work. The first part of the paper provides an intuitive overview of the W3C PROV model with an example involving a complete account of PROV relations with three types of instances: entities, activities, and agents. In PROV, physical, digital, conceptual, or other kinds of things are called entities. [...] Activities are how entities come into existence and how their attributes change to become new entities. [...] An agent takes a role in an activity such that the agent can be assigned some degree of responsibility for the activity taking place. [1] In addition, the validity of the provenance statements is defined with reference to a set of constraints that the statements must satisfy. For instance, when two entities use the predicate prov:wasDerivedFrom , it implies that the first entity precedes the second one. The second part of the paper presents a number of applications that use the PROV model to capture provenance information. In Dictionary, the PROV model asserts the membership of a word in a dictionary and records the change (insertion and removal) history of the words. In Scientific Workflows, the PROV model captures information about the data products used and generated by the steps that compose the workflows. As a result, people can easily debug workflows and reproduce the workflow results. In Executable Documents, the PROV model captures the provenance of each research object to trace its evolution over time. In Smart Cities, the PROV model records the provenance information about citizens and their contributions to assist in the verification of collected data. The paper would have been more complete if the authors had provided a deeper analysis of how the applications apply the PROV model in terms of provenance modeling and querying. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology
    March 2013
    793 pages
    ISBN:9781450315975
    DOI:10.1145/2452376

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 March 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Tutorial

    Conference

    EDBT/ICDT '13

    Acceptance Rates

    Overall Acceptance Rate 7 of 10 submissions, 70%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)54
    • Downloads (Last 6 weeks)7

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)DLProv: A Data-Centric Support for Deep Learning Workflow AnalysesProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663337(77-85)Online publication date: 9-Jun-2024
    • (2024)PROV-IO: A Cross-Platform Provenance Framework for Scientific Data on HPC SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.337455535:5(844-861)Online publication date: May-2024
    • (2024)Managing Provenance Data in Knowledge Graph Management PlatformsDatenbank-Spektrum10.1007/s13222-023-00463-024:1(43-52)Online publication date: 5-Feb-2024
    • (2023)Data Management and Ontology Development for Provenance-Aware Organizations in Linked Data SpaceEuropean Journal of Technic10.36222/ejt.1402149Online publication date: 26-Dec-2023
    • (2023)Data Provenance in Biomedical Research: Scoping ReviewJournal of Medical Internet Research10.2196/4228925(e42289)Online publication date: 27-Mar-2023
    • (2023)OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance from Database Query Event LogsProceedings of the VLDB Endowment10.14778/3611540.361155516:12(3662-3675)Online publication date: 1-Aug-2023
    • (2023)Dataset Discovery and Exploration: A SurveyACM Computing Surveys10.1145/362652156:4(1-37)Online publication date: 9-Nov-2023
    • (2023)Building an Open Representation for Biological ProtocolsACM Journal on Emerging Technologies in Computing Systems10.1145/360456819:3(1-21)Online publication date: 23-Jun-2023
    • (2023)Data Provenance in Security and PrivacyACM Computing Surveys10.1145/359329455:14s(1-35)Online publication date: 22-Apr-2023
    • (2023)Demonstration of Geyser: Provenance Extraction and Applications over Data Science ScriptsCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589717(123-126)Online publication date: 4-Jun-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media