Challenges for Provenance Analytics Over Geospatial Data

Garijo, Daniel; Gil, Yolanda; Harth, Andreas

doi:10.1007/978-3-319-16462-5_28

Daniel Garijo¹⁵,
Yolanda Gil¹⁶ &
Andreas Harth¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8628))

Included in the following conference series:

International Provenance and Annotation Workshop

1831 Accesses
2 Citations

Abstract

The growing availability of geospatial data online, the increased use of crowdsourced maps and the advent of geospatial mash-ups have led to systems that deliver data to users after integration from many sources. In such systems, understanding the provenance of geospatial data is crucial for assessing the quality of the data and deciding on whether to rely on the data for decision making. To be able to use and analyze provenance in geospatial integration systems in a principled manner, we identify different levels of provenance in the geospatial domain, provide a set of provenance questions from the point of view of end users, and relate our geospatial provenance model to the W3C PROV recommendation.

You have full access to this open access chapter, Download conference paper PDF

Provenance Information in Geodata Infrastructures

Data Provenance

Exploring Open Data Portals for Geospatial Data Discovery Purposes

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Open Geospatial Consortium and the World Wide Web Consortium are working jointly towards standards for linking and integrating geospatial data [1]. As geospatial data is often used in decision making (e.g., navigation), the accuracy of integrated data is important. While we specifically cover provenance for geospatial information, some of these challenges are present in many other domains as well. The area of geospatial data integration is a prime scenario for provenance management, as the involved data and systems are complex and exhibit many challenging characteristics:

External sources: when integrating two geospatial datasets, an algorithm might consult other sources.
Human-in-the-loop processes: in some cases, the integration might involve manual intervention, to check particular values by seeking additional confirmation or even perhaps with eyes on target.
Crowdsourcing: datasets may have been collected from many small contributions, which should attacj provenance too.
Granularity: geospatial information may be represented at different levels of granularity in space; a geographical feature can be a point in space (e.g., a road intersection), a one-dimensional segment (e.g., a bridge that connects two points) or a two-dimensional region (e.g., a parking lot).
Computation: spatial reasoning may be needed to compute relationships between features; the integration system may have to integrate computed relations from different sources.
Versioning: maps are updated as the original data sources are updated. The objects in a map themselves can have multiple revisions.

We present an initial study on the requirements and challenges of tracking geospatial provenance, based on discussions with researchers and practitioners at several meetings and workshops on geospatial data.

2 Geospatial Provenance Model

Before we explain how to apply the W3C PROV standard model [2] to the geospatial domain, we present a classification of provenance levels on geospatial data:

Dataset-level provenance: provenance assertions about a map as a single entity. The map contains objects, and these objects contain properties and values, but provenance is associated with the map as a whole.
Object-level provenance: how different objects were created in the map.
Property-level provenance: enables us to answer questions about attributes and attribute values of objects shown in the map.

Modeling detailed provenance across all levels presents a challenge of scale. Maps can have millions of objects, and if we represented each of the integration processes for each object, the amount of information could become larger than the map itself, especially if we assume updates at regular intervals. Property-level provenance aggravates the scale issues of object-level provenance.

In Fig. 1, we list user questions concerning geospatial provenance, grouped according to our provenance model for geospatial data.

Applying PROV to the geospatial domain is straightforward for dataset-level and object-level provenance, as we can use dataset and object identifiers as handle for attaching provenance records to. Property-level provenance requires a more involved approach, as properties are typically accessed through the object and cannot be referenced as a separate entity. Therefore, we would either need to create new identifiers for each property assertion, or to repeat the property assertion itself to be able to attach the provenance record to. Tracking appearing and disappearing objects or values across versions would require to store the entire history of all datasets, including provenance records.

References

Archer, P.: Joint W3C/OGC Workshop on Linking Geospatial Data, March 2014. http://www.w3.org/2014/03/lgd/
Moreau, L., Missier, P.: PROV-DM: The PROV Data Model (2012). http://www.w3.org/TR/prov-dm/

Download references

Author information

Authors and Affiliations

Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
Daniel Garijo
Information Sciences Institute, University of Southern California, Los Angeles, USA
Yolanda Gil
Institute AIFB, Karlsruhe Institute of Technology, Karlsruhe, Germany
Andreas Harth

Authors

Daniel Garijo
View author publications
You can also search for this author in PubMed Google Scholar
Yolanda Gil
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Harth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Garijo .

Editor information

Editors and Affiliations

University of Illinois, Urbana-Champaign, USA
Bertram Ludäscher
Indiana University, Bloomington, USA
Beth Plale

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garijo, D., Gil, Y., Harth, A. (2015). Challenges for Provenance Analytics Over Geospatial Data. In: Ludäscher, B., Plale, B. (eds) Provenance and Annotation of Data and Processes. IPAW 2014. Lecture Notes in Computer Science(), vol 8628. Springer, Cham. https://doi.org/10.1007/978-3-319-16462-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-16462-5_28
Published: 21 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16461-8
Online ISBN: 978-3-319-16462-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Challenges for Provenance Analytics Over Geospatial Data

Abstract

Similar content being viewed by others