This project contains plugins for Pentaho Data Integration (or KETTLE as it is commonly known), that add functionality via Apache Jena for producing RDF.
The plugins provided are:
-
Create Jena Model
This transform plugin can be used to create a Jena Model for each row sent to it. Each Row becomes a Resource, and the plugin enables the mapping of fields to RDF Literals or Resources. The plugin includes support for constructing Blank Nodes within Resources. -
Combine Jena Models
This transform plugin allows you to merge multiple Jena Models that are within the same row into a single model. This can be considered as a horizontal transformation within a row. -
Group Merge Jena Models
This transform plugin performs a Group By operation across consecutive rows, allowing you to merge multiple Jena Models that are within consecutive rows into a single model in a single row. This can be considered as a vertical transformation across rows. -
Serialize Jena Model
This output plugin takes the output of the Create Jena Model plugin, and serializes it to an RDF file on disk. Supports Turtle, N3, N-Triples, and RDF/XML output formats. -
SHACL Validation
This validation plugin supports validation of a Jena Model object created by the Create Jena Model plugin against a SHACL shape file loaded from the file system.
This project was developed by Evolved Binary and DeveXe as part of Project OMEGA for the National Archives.
You can either download the plugins from our GitHub releases page: https://github.com/nationalarchives/kettle-jena-plugins/releases/, or you can build them from source.
The plugins can be built from Source code by installing the pre-requisites and following the steps described below.
- Apache Maven, version 3+
- Java JDK 1.8
- Git
-
Clone the Git repository
$ git clone https://github.com/nationalarchives/kettle-jena-plugins.git
-
Compile a package
$ cd kettle-jena-plugins $ mvn clean package
-
The plugins directory is then available at
target/kettle-jena-plugins-1.0.0-SNAPSHOT-kettle-plugin/kettle-jena-plugins
- Tested with Pentaho Data Integration - Community Edition - version: 9.1.0.0-324
You need to copy the plugins directory kettle-jena-plugins
(from building above) into the plugins
sub-directory of your KETTLE installation.
This can be done by either running:
$ mvn -Pdeploy-pdi-local -Dpentaho-kettle.plugins.dir=/opt/data-integration/plugins antrun:run@deploy-to-pdi
or, you can do so manually, e.g.:
$ cp -r target/kettle-jena-plugins-1.0.0-SNAPSHOT-kettle-plugin/kettle-jena-plugins /opt/data-integration/plugins/
We wrote a short blog about working with the plugins: https://blog.adamretter.org.uk/rdf-plugins-for-pentaho-kettle/
We also created a small screencast demonstrating how to use the plugins in Pentaho Kettle. It's hosted on YouTube, click the image below to visit the video: