Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Link Search Menu Expand Document

Using Virtual Graphs

Learn how to create and query virtual graphs over relational data in Stardog.

Page Contents
  1. Introduction
  2. Prerequisites
    1. RDBMS Setup
    2. JDBC Setup
    3. Database Setup
    4. Virtual Graph Configuration
  3. Creating a Virtual Graph
    1. Querying a Virtual Graph
  4. Importing a Virtual Graph
  5. Conclusion

Introduction

In the Virtual Graph Mappings tutorial we learned how to write mappings that map relational data to the Knowledge Graph. In this tutorial we will learn how to create a virtual graph in Stardog using these mappings and how to query the virtual graph using SPARQL as if the data was stored as an RDF graph in Stardog.

Prerequisites

If you would like to follow the steps in this tutorial, there are some initial setup steps to perform as explained in the following sections.

RDBMS Setup

In order to query a virtual graph over relational data we first need to have some relational data. We provide the SQL files necessary to create the tables introduced in the mappings tutorials and insert some sample data into those tables.

The file music_schema.sql creates empty tables and should be executed first. The file beatles.sql populates these tables with a few rows about The Beatles and should be executed next.

Note that these generic SQL files might need to be modified to match the SQL dialect of the RDBMS you are using.

JDBC Setup

Stardog does not ship with JDBC drivers so you need to manually copy the JAR file containing the driver for your RDBMS into the classpath of the Stardog server. The preferred way of doing this is by setting the STARDOG_EXT environment variable to point to a directory outside the Stardog installation directory and copy the jar file there:

$ export STARDOG_EXT=/opt/stardog/ext
$ mkdir $STARDOG_EXT
$ cp JDBC_JAR_FILE $STARDOG_EXT

Note that the environment variable needs to be set before the Stardog server is started. If the variable was already set before the server is started then the JAR files can be copied to that location while the server is running without a restart.

Database Setup

In order to query a virtual graph (or any graph for that matter), we need to have a database in Stardog. In this tutorial we will start with an empty database since the data will be stored externally in the RDBMS. We can create an empty database in the usual way:

stardog-admin db create -n music

We will set the default namespace for the database upfront so that we don’t have to worry about prefix declarations later on:

stardog namespace add --prefix "" --uri "http://stardog.com/tutorial/" music

Virtual Graph Configuration

In order to create a virtual graph we need a configuration file that will provide the JDBC URL to access the RDBMS along with the required credentials. This file should be in the Java properties file format. An example configuration file for a MySQL database running on the same machine as the Stardog server would look like this:

jdbc.url=jdbc:mysql://localhost/music
jdbc.username=admin
jdbc.password=admin
jdbc.driver=com.mysql.jdbc.Driver

An alternative way to provide credentials is to use the password file mechanism as explained in the Stardog docs. If the password file is used, then credentials can be omitted from the configuration file. There are also additional properties that can be set in the configuration file.

Creating a Virtual Graph

Creating a virtual graph in Stardog is done with the virtual add command by specifying the configuration file and the mappings file as input parameters:

stardog-admin virtual add music.properties music_mappings.sms

The name of the configuration file is used as the virtual graph name so this command will create a virtual graph named music. We do not specify the database name in this command because the virtual graphs are defined globally in the Stardog server. Once they are registered with the server, they can be accessed via any Stardog database as allowed by the access rules.

Querying a Virtual Graph

Virtual graphs can be queried using the GRAPH clause in SPARQL similar to regular named graphs. The IRI of the virtual graph uses a special IRI scheme virtual:// so the IRI for the music virtual graph we created is simply virtual://music.

The query to retrieve the members of The Beatles from the virtual graph looks like this:

SELECT ?member ?name {
   GRAPH <virtual://music> {
      ?band a :Band ;
            :name "The Beatles" ;
            :member ?member .
      ?member :name ?name .
   }
}

We can execute this query using the regular query execution command (either by saving the query string in a file or by passing the query string as a parameter):

stardog query execute music beatles_members.sparql

The result for this query will look like this:

member name
:Artist1 John Lennon
:Artist2 Paul McCartney
:Artist3 Ringo Starr
:Artist4 George Harrison

The IRIs for The Beatles members are automatically generated using the template specified in the mappings which appends the id field from the Artist table to the artist namespace.

Suppose we execute the following update query that will store the birth date of the Beatles’ members in Stardog natively:

INSERT DATA {
   :Artist1 :birthDate "1940-10-09"^^xsd:date .
   :Artist2 :birthDate "1942-06-18"^^xsd:date .
   :Artist3 :birthDate "1940-07-07"^^xsd:date .
   :Artist4 :birthDate "1943-02-25"^^xsd:date .
}

We can write a query that will combine the data from the virtual graph with the birthdate nodes and edges stored directly in Stardog.

SELECT ?member ?name ?birthDate {
   GRAPH <virtual://music> {
      ?band a :Band ;
            :name "The Beatles" ;
            :member ?member .
      ?member :name ?name .
   }
   ?member :birthDate ?birthDate
}

In many cases the information about resources will be combined from multiple virtual and local graphs but there won’t be a shared identifier for the entities in different graphs as each data source uses different local identifiers. In this tutorial we simply reused the identifier from our single virtual graph to assert additional information manually, but in real world applications more sophisticated methods are needed to link entities from different data sources. We will discuss these techniques in a future tutorial.

We can run more complicated queries involving filters or aggregation over virtual graphs and the corresponding elements will be automatically translated to SQL by Stardog behind the scenes. For example, we can compute the number of albums released every year:

SELECT ?year (count(?album) AS ?count)
FROM <virtual://music> {
      ?album a :Album ;
             :date ?date ;
      BIND (year(?date) AS ?year)
}
GROUP By ?year

For this query we used the FROM clause instead of the GRAPH clause which is another way to query named graphs in SPARQL and works with virtual graphs too.

We can look at the query plan Stardog generates to see the exact SQL generated by Stardog:

$ stardog query explain music albums_by_year.sparql
...
The Query Plan:

From <virtual://music>
Projection(?year, ?count) [#1]
`─ VirtualGraphSql<virtual://music> [#1] {
   +─    SELECT YEAR(`release_date`) AS `BIND_year`, COUNT(*) AS `count`
   +─    FROM `Album`
   +─    WHERE `release_date` IS NOT NULL
   +─    GROUP BY YEAR(`release_date`)
   }

As the query plan shows, Stardog pushes functions and aggregation to the SQL to execute the query efficiently.

Importing a Virtual Graph

In some cases you need to materialize the information stored in an RDBMS directly into Stardog. In one bulk operation all the contents of the relational data source can be mapped to RDF and added into the Stardog database. In some cases query performance over materialized graphs will be better as the data will be indexed locally by Stardog, but materialization may not be practical in cases where frequency of change is very high.

If the mappings for the virtual graph are already registered, then a regular SPARQL update query can be used for importing the virtual graph:

COPY <virtual://music> TO :importGraph

The COPY query clears the contents of the target graph :importGraph first so if the RDBMS contents change over time this command can be used to repopulate the materialized graph in the future. If the import operation will be done only once then the virtual import command can be used to import the contents of the RDBMS into Stardog without registering the virtual graph first.

Conclusion

Stardog’s Virtual Graph capability provides extreme flexibility by exposing relational data as a graph without moving the data out of the RDBMS. In cases where this is not possible due to performance or quality of service constraints, the same logical mappings used for virtualization can be used to materialize the data in Stardog allowing users to easily switch between two different modes of operation.