Abstract
As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that the user discloses formal workflows. In this paper, we propose a new approach for automatic discovery of provenance, at multiple levels of granularity. To accomplish this, we detect entity derivations, relying on clustering algorithms, linked data and semantic similarity. The resulting derivations are structured in compliance with the Provenance Data Model (PROV-DM). While the proposed approach is purposely kept general, allowing adaptation in many use cases, we provide an implementation for one of these use cases, namely discovering the sources of news articles. With this implementation, we were able to detect 73% of the original sources of 410 news stories, at 68% precision. Lastly, we discuss possible improvements and future work.
Chapter PDF
Similar content being viewed by others
References
Gil, Y., Cheney, J., Groth, P., Hartig, O., Miles, S., Moreau, L., Da Silva, P.P.: Provenance XG final report. Final Incubator Group Report (2010)
Gómez-Pérez, J.M., Corcho, O.: Problem-solving methods for understanding process executions. IEEE Computing in Science & Engineering 10, 47–52 (2008)
Braun, U., Garfinkel, S., Holland, D.A., Muniswamy-Reddy, K.-K., Seltzer, M.I.: Issues in Automatic Provenance Collection. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 171–183. Springer, Heidelberg (2006)
PROV-DM Part 1: The Provenance Data Model, W3C Editor’s Draft (May 29, 2012), http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-dm.html
Rizzo, G., Troncy, R.: NERD: Evaluating Named Entity Recognition Tools in the Web of Data. In: Workshop on Web Scale Knowledge Extraction, WEKEX 2011 (2011)
Iacobelli, F., Nichols, N., Birnbaum, L., Hammond, K.: Finding new information via robust entity detection. In: Proactive Assistant Agents AAAI Fall Symposium (2010)
Hasan, M.A., Salem, S., Pupacdi, B., Zaki, M.J.: Clustering with Lower Bound on Similarity. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 122–133. Springer, Heidelberg (2009)
Zhao, J., Sahoo, S.S., Missier, P., Sheth, A., Goble, C.: Extending semantic provenance into the web of data. IEEE Internet Computing, 40–48 (2011)
Zhao, J., Gomadam, K., Prasanna, V.: Predicting Missing Provenance using Semantic Associations in Reservoir Engineering. In: 2011 Fifth IEEE International Conference on Semantic Computing, ICSC (2011)
Zhang, J., Jagadish, H.V.: Lost source provenance. In: Proceedings of the 13th International Conference on Extending Database Technology. ACM (2010)
Ram, S., Liu, J.: A new perspective on Semantics of Data Provenance. In: First International Workshop on the Role of Semantic Web in Provenance Management, SWPM (2009)
Moreau, L.: The foundations for provenance on the web. Now Publishers (2010)
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. International Journal on Semantic Web and Information Systems (IJSWIS), 55–73 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
De Nies, T., Coppens, S., Van Deursen, D., Mannens, E., Van de Walle, R. (2012). Automatic Discovery of High-Level Provenance Using Semantic Similarity. In: Groth, P., Frew, J. (eds) Provenance and Annotation of Data and Processes. IPAW 2012. Lecture Notes in Computer Science, vol 7525. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34222-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-34222-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34221-9
Online ISBN: 978-3-642-34222-6
eBook Packages: Computer ScienceComputer Science (R0)