Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A Distributed System for The Management of Fine-grained Provenance

Published: 01 April 2015 Publication History

Abstract

Existing provenance systems operate at a single layer of abstraction workflow/process/OS at which they record and store provenance. However, the provenance captured from different layers provides the highest benefit when integrated through a unified provenance framework. To build such a framework, a comprehensive provenance model able to represent the provenance of data objects with various semantics and granularity is the first step. In this paper, the authors propose a provenance model able to represent the provenance of any data object captured at any abstraction layer and present an abstract schema of the model. The expressive nature of the model enables a wide range of provenance queries. The authors also illustrate the utility of their model in real world data processing systems. In the paper, they also introduce a data provenance distributed middleware system composed of several different components and services that capture provenance according to their model and securely stores it in a central repository. As part of our middleware, the authors present a thin stackable file system, called FiPS, for capturing local provenance in a portable manner. FiPS is able to capture provenance at various degrees of granularity, transform provenance records into secure information, and direct the resulting provenance data to various persistent storage systems.

References

[1]
Cohen, S., Cohen-boulakia, S., & Davidson, S. 2006. Towards a Model of Provenance and User Views in Scientific Workflows pp. 264-279. Data Integration in the Life Sciences.
[2]
FosterI.VocklerJ.WildeM.ZhaoY. 2002. Chimera: a virtual data system for representing, querying, and automating data derivation. Conference on Scientific and Statistical Database Management SSDBM, pp. 37-46. 10.1109/SSDM.2002.1029704
[3]
Frew, J., Metzger, D., & Slaughter, P. 2008. Automatic Capture and Reconstruction of Computational Provenance. Concurrency and Computation, 205, 485-496.
[4]
Groth, P., Jiang, S., Miles, S., Munroe, S., Tan, V., Tsasakou, S., et al. 2006. An Architecture for Provenance Systems. Contract D3.1.1.
[5]
Groth, P., Miles, S., & Moreau, L. 2005. PReServ: Provenance Recording for Services. Translator.
[6]
Huettel, S., Song, A., & McCarthy, G. 2004. Functional Magnetic Resonance Imaging. Sinauer Associates. Jan\'{e}e, G., Mathena, J., & Frew, J. 2008. A Data Model and Architecture for Long-Term Preservation. Proceedings of the Conference on Digital libraries pp. 134-144.
[7]
Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., & den Bussche, J. V. et al . 2011. The Open Provenance Model Core Specification v1.1. Future Generation Computer Systems, 276, 743-756.
[8]
Muniswamy-Reddy, K., Holland, D., Braun, U., & Seltzer, M. 2006. Provenance-aware storage systems. Proceedings of the USENIX Annual Technical Conference.
[9]
NiQ.XuS.BertinoE.SandhuR.HanW. 2009. An Access Control Language for a General Provenance Model. Proceedings of the VLDB Workshop on Secure Data Management SDM pp. 68-88. 10.1007/978-3-642-04219-5_5
[10]
Plale, B., Gannon, D., Reed, D., Droegemeier, K., Wilhelmson, B., & Ramamurthy, M. 2005. Towards Dynamically Adaptive Weather Analysis and Forecasting. Proceedings of the ICCS workshop on Dynamic Data Driven Applications pp. 624-631.
[11]
Simmhan, Y., Plale, B., & Gannon, D. 2005. A Survey of Data Provenance in e-Science. SIGMOD Record, 343, 31-36.
[12]
Woodruff, A., & Stonebraker, M. 1997. Supporting Fine-grained Data Lineage in a Database Visualization Environment. Proceedings of the International Conference on Data Engineering ICDE pp. 91-102. 10.1109/ICDE.1997.581742
[13]
Zadok, E., & Badulescu, I. 1998. A Stackable File System Interface For Linux. Technical Report. Columbia University.
[14]
Zhao, J., Goble, C., Stevens, R., & Bechhofer, S. 2004. Semantically Linking and Browsing Provenance Logs for e-Science. Semantics of a Networked World Semantics For Grid Databases pp. 158-176.

Cited By

View all
  • (2018)Scalable Privacy-Preserving Big Data Management and AnalyticsProceedings of the 2018 2nd International Conference on Cloud and Big Data Computing10.1145/3264560.3266429(52-56)Online publication date: 3-Aug-2018
  • (2018)Pattern mining based compression of IoT dataProceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking10.1145/3170521.3170533(1-6)Online publication date: 4-Jan-2018
  • (2018)Pedigree-ing your big dataProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00100(675-681)Online publication date: 1-May-2018
  1. A Distributed System for The Management of Fine-grained Provenance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of Database Management
    Journal of Database Management  Volume 26, Issue 2
    April 2015
    61 pages

    Publisher

    IGI Global

    United States

    Publication History

    Published: 01 April 2015

    Author Tags

    1. Abstraction
    2. Data Provenance
    3. Digital Provenance Cycle Provenance
    4. System-call Based Systems

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Scalable Privacy-Preserving Big Data Management and AnalyticsProceedings of the 2018 2nd International Conference on Cloud and Big Data Computing10.1145/3264560.3266429(52-56)Online publication date: 3-Aug-2018
    • (2018)Pattern mining based compression of IoT dataProceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking10.1145/3170521.3170533(1-6)Online publication date: 4-Jan-2018
    • (2018)Pedigree-ing your big dataProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00100(675-681)Online publication date: 1-May-2018

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media