Automatic Discovery of High-Level Provenance Using Semantic Similarity

De Nies, Tom; Coppens, Sam; Van Deursen, Davy; Mannens, Erik; Van de Walle, Rik

doi:10.1007/978-3-642-34222-6_8

Tom De Nies¹⁸,
Sam Coppens¹⁸,
Davy Van Deursen¹⁸,
Erik Mannens¹⁸ &
…
Rik Van de Walle¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7525))

Included in the following conference series:

International Provenance and Annotation Workshop

1103 Accesses

Abstract

As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that the user discloses formal workflows. In this paper, we propose a new approach for automatic discovery of provenance, at multiple levels of granularity. To accomplish this, we detect entity derivations, relying on clustering algorithms, linked data and semantic similarity. The resulting derivations are structured in compliance with the Provenance Data Model (PROV-DM). While the proposed approach is purposely kept general, allowing adaptation in many use cases, we provide an implementation for one of these use cases, namely discovering the sources of news articles. With this implementation, we were able to detect 73% of the original sources of 410 news stories, at 68% precision. Lastly, we discuss possible improvements and future work.

Download to read the full chapter text

Chapter PDF

Provenance in Databases: Principles and Applications

A survey on semantic schema discovery

Article 27 November 2021

Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge Graphs

Article Open access 08 May 2020

Keywords

References

Gil, Y., Cheney, J., Groth, P., Hartig, O., Miles, S., Moreau, L., Da Silva, P.P.: Provenance XG final report. Final Incubator Group Report (2010)
Google Scholar
Gómez-Pérez, J.M., Corcho, O.: Problem-solving methods for understanding process executions. IEEE Computing in Science & Engineering 10, 47–52 (2008)
Article Google Scholar
Braun, U., Garfinkel, S., Holland, D.A., Muniswamy-Reddy, K.-K., Seltzer, M.I.: Issues in Automatic Provenance Collection. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 171–183. Springer, Heidelberg (2006)
Chapter Google Scholar
PROV-DM Part 1: The Provenance Data Model, W3C Editor’s Draft (May 29, 2012), http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-dm.html
Rizzo, G., Troncy, R.: NERD: Evaluating Named Entity Recognition Tools in the Web of Data. In: Workshop on Web Scale Knowledge Extraction, WEKEX 2011 (2011)
Google Scholar
Iacobelli, F., Nichols, N., Birnbaum, L., Hammond, K.: Finding new information via robust entity detection. In: Proactive Assistant Agents AAAI Fall Symposium (2010)
Google Scholar
Hasan, M.A., Salem, S., Pupacdi, B., Zaki, M.J.: Clustering with Lower Bound on Similarity. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 122–133. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhao, J., Sahoo, S.S., Missier, P., Sheth, A., Goble, C.: Extending semantic provenance into the web of data. IEEE Internet Computing, 40–48 (2011)
Google Scholar
Zhao, J., Gomadam, K., Prasanna, V.: Predicting Missing Provenance using Semantic Associations in Reservoir Engineering. In: 2011 Fifth IEEE International Conference on Semantic Computing, ICSC (2011)
Google Scholar
Zhang, J., Jagadish, H.V.: Lost source provenance. In: Proceedings of the 13th International Conference on Extending Database Technology. ACM (2010)
Google Scholar
Ram, S., Liu, J.: A new perspective on Semantics of Data Provenance. In: First International Workshop on the Role of Semantic Web in Provenance Management, SWPM (2009)
Google Scholar
Moreau, L.: The foundations for provenance on the web. Now Publishers (2010)
Google Scholar
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. International Journal on Semantic Web and Information Systems (IJSWIS), 55–73 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Information Systems, Multimedia Lab, Ghent University - IBBT, Gaston Crommenlaan 8 bus 201, B-9050, Ledeberg-Ghent, Belgium
Tom De Nies, Sam Coppens, Davy Van Deursen, Erik Mannens & Rik Van de Walle

Authors

Tom De Nies
View author publications
You can also search for this author in PubMed Google Scholar
Sam Coppens
View author publications
You can also search for this author in PubMed Google Scholar
Davy Van Deursen
View author publications
You can also search for this author in PubMed Google Scholar
Erik Mannens
View author publications
You can also search for this author in PubMed Google Scholar
Rik Van de Walle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Free University Amsterdam, De Boelelaan 1105, 1081HV, Amsterdam, The Netherlands
Paul Groth
Bren School of Environmental Science and Management, University of California, 2400 Bren Hall, 93106-5131, Santa Barbara, CA, USA
James Frew

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Nies, T., Coppens, S., Van Deursen, D., Mannens, E., Van de Walle, R. (2012). Automatic Discovery of High-Level Provenance Using Semantic Similarity. In: Groth, P., Frew, J. (eds) Provenance and Annotation of Data and Processes. IPAW 2012. Lecture Notes in Computer Science, vol 7525. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34222-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-34222-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34221-9
Online ISBN: 978-3-642-34222-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Discovery of High-Level Provenance Using Semantic Similarity

Abstract

Chapter PDF

Similar content being viewed by others

Provenance in Databases: Principles and Applications

A survey on semantic schema discovery

Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge Graphs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Discovery of High-Level Provenance Using Semantic Similarity

Abstract

Chapter PDF

Similar content being viewed by others

Provenance in Databases: Principles and Applications

A survey on semantic schema discovery

Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge Graphs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation