Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3106426.3106495acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Linked data processing provenance: towards transparent and reusable linked data integration

Published: 23 August 2017 Publication History

Abstract

The growth of Linked Data has created a promising environment for data exploration and a growing number of tools allow users to interactively integrate data from various sources. Eliciting the reliability of the results of such ad-hoc integration processes, consistently recreating those results, and identifying changes upon re-execution, however, can be difficult. Automated process provenance trail creation can provide major benefits in this context, because (i) it enables users to trace the contribution of individual sources and processing steps to the final outcome and judge whether the result can be trusted; (ii) it ensures repeatability and raises the trustworthiness of results; (iii) it ideally enables reconstruction of Linked Data integration processes from the provenance information embedded in the final result. In this paper, we present a provenance model that facilitates automatic generation of semantic provenance information for generic Linked Data integration processes. We implement the generic model in a collaborative mashup environment and evaluate it by means of an example application. We find that the model provides a solid foundation for verifiability and contributes towards making Linked Data integration processes more open, transparent, and reusable, which is crucial in domains where the origin of data is essential, such as, for instance, statistical analyses, scientific research, and data journalism.

References

[1]
2013. PROV-DM: The PROV data model. Technical Report. World Wide Web Consortium. http://eprints.soton.ac.uk/356851/
[2]
Sarawat Anam, Byeong Ho Kang, Yang Sok Kim, and Qing Liu. 2015. Linked data provenance: State of the art and challenges. In 3rd Australasian Web Conference (AWC 2015), Vol. 166. 19--28.
[3]
Khalid Belhajjame, James Cheney, David Corsar, Daniel Garijo, Stian Soiland- Reyes, Stephan Zednik, and Jun Zhao. 2013. PROV-O: The PROV ontology. (April 2013). http://www.w3.org/TR/prov-o/
[4]
Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In Database Theory - ICDT 2001, 8th International Conference, London, UK, January 4-6, 2001, Proceedings. (Lecture Notes in Computer Science), Jan Van den Bussche and Victor Vianu (Eds.), Vol. 1973. Springer, 316--330.
[5]
Adriane Chapman and H. V. Jagadish. 2007. Issues in Building Practical Provenance Systems. IEEE Data Eng. Bull. 30, 4 (2007), 38--43.
[6]
James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1, 4 (2009), 379--474.
[7]
Florian Daniel and Maristella Matera. 2014. Mashups: Concepts, models and architectures. Springer, Heidelberg.
[8]
Anastasia Dimou, Tom De Nies, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. 2016. Automated Metadata Generation for Linked Data Generation and Publishing Workflows. In Proceedings of the 9th Workshop on Linked Data on the Web.
[9]
Darlene Fichter. 2010. What is a mashup. In Library mashups: Exploring new ways to deliver library data. Information Today, Incorporated.
[10]
Jonathan Gray, Liliana Bounegru, and Lucy Chambers. 2012. The data journalism handbook. O'Reilly Media.
[11]
Mark Greenwood, CA Goble, Robert D Stevens, Jun Zhao, Matthew Addis, Darren Marvin, Luc Moreau, and Tom Oinn. 2003. Provenance of e-science experiments-experience from bioinformatics. In Proceedings of UK e-Science All Hands Meeting 2003. 223--226.
[12]
Paul T. Groth, Yolanda Gil, James Cheney, and Simon Miles. 2012. Requirements for Provenance on the Web. IJDC 7, 1 (2012), 39--56.
[13]
Olaf Hartig and Jun Zhao. 2009. Using Web Data Provenance for Quality Assessment. In Proceedings of the First International Conference on Semantic Web in Provenance Management - Volume 526 (SWPM'09). CEUR-WS.org, Aachen, Germany, Germany, 29--34.
[14]
Tom Heath and Christian Bizer. 2011. Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology 1, 1 (2011), 1--136.
[15]
Duncan Hull, Katy Wolstencroft, Robert Stevens, Carole A. Goble, Matthew R. Pocock, Peter Li, and Tom Oinn. 2006. Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34, Web-Server-Issue (2006), 729--732.
[16]
Paolo Missier, Saumen Dey, Khalid Belhajjame, Víctor Cuevas-Vicenttín, and Bertram Ludäscher. 2013. D-PROV: Extending the PROV provenance model with workflow structure. In Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance (TaPP '13). USENIX Association, Berkeley, CA, USA, Article 9, 7 pages.
[17]
Paolo Missier, Satya S. Sahoo, Jun Zhao, Carole Goble, and Amit Sheth. 2010. Janus: From workflows to semantic provenance and Linked Open Data. In Provenance and annotation of data and processes: Third International Provenance and Annotation Workshop, IPAW2010, Troy, NY, USA, June 15-16, 2010. Revised Selected Papers. Springer Berlin Heidelberg, Berlin, Heidelberg, 129--141.
[18]
Luc Moreau. 2011. Provenance-based reproducibility in the Semantic Web. Web Semantics: Science, Services and Agents on the World Wide Web 9, 2 (2011), 202--221. Provenance in the Semantic Web.
[19]
Luc Moreau, Paul Groth, James Cheney, Timothy Lebo, and Simon Miles. 2015. The rationale of PROV. Web Semantics: Science, Services and Agents on the World Wide Web 35, 4 (2015), 235--257.
[20]
Natalya F. Noy and Deborah L. Mcguinness. 2001. Ontology Development 101: A Guide to Creating Your First Ontology. Technical Report. Stanford University.
[21]
Tope Omitola, Landong Zuo, Christopher Gutteridge, Ian C. Millard, Hugh Glaser, Nicholas Gibbins, and Nigel Shadbolt. 2011. Tracing the Provenance of Linked Data Using voiD. In Proceedings of the International Conference on Web Intelligence, Mining and Semantics (WIMS '11). ACM, New York, NY, USA, Article 17, 7 pages.
[22]
Mario Andrés Paredes-Valverde, Giner Alor-Hernández, Alejandro Rodríguez González, Rafael Valencia-García, and Enrique Jiménez-Domingo. 2015. A systematic review of tools, languages, and methodologies for mashup development. Softw., Pract. Exper. 45, 3 (2015), 365--397.
[23]
Sandro Rautenberg, Ivan Ermilov, Edgard Marx, Sören Auer, and Axel-Cyrille N. Ngomo. 2015. LODFlow: A Workflow Management System for Linked Data Processing. In Proceedings of the 11th International Conference on Semantic Systems (SEMANTICS '15). ACM, New York, NY, USA, 137--144.
[24]
York Sure, Steffen Staab, and Rudi Studer. 2004. Handbook on ontologies. Springer Berlin Heidelberg, Berlin, Heidelberg, Chapter On-To-Knowledge Methodology (OTKM), 117--132.
[25]
Tuan-Dat Trinh, Peter Wetz, Ba-Lam Do, Elmar Kiesling, and A Min Tjoa. 2015. Distributed mashups: A collaborative approach to data integration. IJWIS 11, 3 (2015), 370--396.
[26]
Marcin Wylot, Philippe Cudre-Mauroux, and Paul Groth. 2015. Executing Provenance-Enabled Queries over Web Data. In Proceedings of the 24th International Conference on World Wide Web (WWW '15). ACM, New York, NY, USA, 1275--1285.
[27]
Jun Zhao and Olaf Hartig. 2012. Towards interoperable provenance publication on the Linked Data Web. In WWW2012 Workshop on Linked Data on the Web, Lyon, France, 16 April, 2012 (CEUR Workshop Proceedings), Christian Bizer, Tom Heath, Tim Berners-Lee, and Michael Hausenblas (Eds.), Vol. 937. CEUR-WS.
[28]
Jun Zhao, Chris Wroe, Carole Goble, Robert Stevens, Dennis Quan, and Mark Greenwood. 2004. The Semantic Web - ISWC 2004: Third International Semantic Web Conference, Hiroshima, Japan, November 7-11, 2004. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, Chapter Using Semantic Web technologies for representing E-science provenance, 92--106.

Cited By

View all
  • (2021)An Ontological Model and Services for Capturing and Tracking Provenance in Decentralized Social NetworksProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3470482.3479637(221-228)Online publication date: 5-Nov-2021
  • (2020)Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge GraphsData Science and Engineering10.1007/s41019-020-00118-0Online publication date: 8-May-2020
  • (2019)Privacy-aware Linked WidgetsCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3317591(508-514)Online publication date: 13-May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WI '17: Proceedings of the International Conference on Web Intelligence
August 2017
1284 pages
ISBN:9781450349512
DOI:10.1145/3106426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data processing
  2. linked data
  3. provenance

Qualifiers

  • Research-article

Conference

WI '17
Sponsor:

Acceptance Rates

WI '17 Paper Acceptance Rate 118 of 178 submissions, 66%;
Overall Acceptance Rate 118 of 178 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)An Ontological Model and Services for Capturing and Tracking Provenance in Decentralized Social NetworksProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3470482.3479637(221-228)Online publication date: 5-Nov-2021
  • (2020)Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge GraphsData Science and Engineering10.1007/s41019-020-00118-0Online publication date: 8-May-2020
  • (2019)Privacy-aware Linked WidgetsCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3317591(508-514)Online publication date: 13-May-2019
  • (2019)Enhancing Open Government Data With Data ProvenanceProceedings of the 11th International Conference on Management of Digital EcoSystems10.1145/3297662.3365791(142-149)Online publication date: 12-Nov-2019
  • (2018)Research on Data Provenance Model for Multidisciplinary CollaborationComputer Supported Cooperative Work and Social Computing10.1007/978-981-13-3044-5_3(32-49)Online publication date: 11-Dec-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media