Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Attachment 0

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Ref.

Ares(2020)476621 - 25/01/2020

QROWD - Because Big Data


Integration is Humanly
Possible
Innovation action

D4.2 – Data acquisition


framework

Author/s Tomás Pariente, Nines Sanguino, Javier


Villazán, Ricardo Ruiz (ATOS)
Reviewed by Daniel Hladky (AI4BD)
Due date 30.11.2018
(Updated on 30.11.2019)
Version 2.0
Dissemination level PU
Status Final
Changes Changes made to the document are
explained in section 1.2.
Data acquisition framework

Project co-funded under Horizon 2020 Research and Innovation EU programme, grant agreement no. 732194

Table of contents
ABSTRACT 4

EXECUTIVE SUMMARY 5

INTRODUCTION 6
Overview and structure of the Document 6
Tracked changes 6

GENERAL DATA ACQUISITION FRAMEWORK 7


Data acquisition and integration in context 7
Data acquisition in QROWD 8
Intended users 9
Technology stack 10
NIFI 10
OASC 10
Context Broker 10
CKAN 11

STATIC DATA ACQUISITION 12


Description of data 12
Data flow 13
CKAN General considerations 15
CKAN Structure: Datasets, resources, metadata and naming conventions 15
Versioning 16
Ownership 18
Core static data acquisition functionality 19
CSDA-1: Upload/Update a dataset into CKAN (from a URL) 20
Implementation with NiFi 20
CSDA-2: Upload/Update a dataset into CKAN (from a URL + decompress) 21
Implementation with NiFi 22
CSDA-3: JSON-based transformations + Update a dataset into CKAN (without versioning) 23
Implementation with NiFi 23
CSDA-4: External-script transformations + Update a dataset into CKAN (without versioning)
25
Implementation with NiFi 25
Guided procedures 27
General recommendations 27
CSDA-1: Upload/Update a dataset into CKAN (from a URL) 28

D4.2 Page 2 of 61
Data acquisition framework
CSDA-2: Upload/Update a dataset into CKAN (from a URL + decompress) 31
CSDA-3: JSON-based transformations + Update a dataset into CKAN 33
CSDA-4: External-script transformations + Update a dataset into CKAN 36
Application of static core templates: Integrated acquisition and FIWARE/GeoJSON
transformation of Municipality of Trento datasets 39
Deployment and release of the framework 42
Deployment 42
Release 42
Static Requirements Validation 42

DYNAMIC DATA ACQUISITION 45


Datasets 45
Data flow 45
Context Broker general considerations 46
Core dynamic data acquisition functionality 47
CDDA-5: JSON based transformation + Update to Context broker 47
Guided Procedure 50
Deployment of the framework 51
Deployment 51
Release 51
Dynamic Requirements Validation 52

CONCLUSIONS 55

REFERENCES 56

ANNEX 1: Deployed dataflows 57

1030 58

1031 58

D4.2 Page 3 of 61
Data acquisition framework

ABSTRACT
This document presents the main building blocks of the QROWD Data Acquisition
Framework. The framework is based in the definition of several data flows created by
combination of Apache NiFi templates generated in the scope of QROWD. The
document presents the way new datasets can be acquired in CKAN (static datasets),
or uploaded to the Orion Context Broker (dynamic or streaming data) for further use.

D4.2 Page 4 of 61
Data acquisition framework

EXECUTIVE SUMMARY
This document is the second deliverable of QROWD WP4 and presents the main
building blocks of the QROWD Data Acquisition Framework for static and dynamic
datasets. It is intended mainly for developers of data-enabled applications that would
like to make use of a data portal (CKAN as selected implementation) and a broker
(the FIWARE Orion Context Broker as selected implementation) as data repositories.
The reported framework is in context of the QROWD high-level architecture
described in D8.1.

The document explains the rationale behind the data acquisition framework. The
framework is based in the definition of data flows created by combination of Apache
NiFi templates generated in the scope of QROWD. The templates are useful for
different purposes, ranging from uploading new datasets to CKAN (static datasets),
or gathering the latest status of the dynamic of streaming data to the Orion Context
Broker (dynamic or streaming data). The combination of those templates allow
developers to easily implement new data flows by using the user interface providing
by NiFi with none or minimal programming effort.

The deliverable also explains the methodology and best practices to create data
flows and perform simple data transformations in NiFi. The document shows how
this is done to generate datasets in multiple formats and flavours and upload them to
CKAN.

The document provides an overview about how versioning of datasets is handle in


CKAN for the purpose of QROWD.

This document is accompanied with software related to the data acquisition


framework (NiFi templates and processes) as well as several instantiations that allow
the acquisition of many of the datasets listed in the deliverable D4.1 (Data Catalog)
necessary for the WP2 Trento pilot. It also provides input to QROWD developers,
especially from WP3 and WP6 to upload data to CKAN and fulfil use cases required
by the pilots.

D4.2 Page 5 of 61
Data acquisition framework

1. INTRODUCTION
1.1. Overview and structure of the document

This document is the second deliverable of WP4. It is reporting about the Data
Acquisition Framework developed in the scope of QROWD. In particular, the
document reports on the results of task 4.2 (Data acquisition), and as such is
providing the methods and software artefacts able to fulfill the requirements for data
acquisition gathered from the pilot use cases (WP1 and WP2). This involves the
development and deployment of the data acquisition functionalities specified
originally in deliverable D4.1. [D4.1] and polished after several iterations with the
project partners.

The document is structured as follows:


● Section 1 introduces the document.
● Section 2 explains the general data acquisition framework
● Section 3 reports on the acquisition of static datasets
● Section 4 introduces the acquisition of dynamic datasets
● Section 6 summarises and concludes the document

1.2. Tracked changes

This document has been revised in November 2019 to keep updated the catalog with
the latest information available in relation to datasets by the end of the project.
Besides cosmetic changes, the main updates are the following:

● Add and update the new NiFi pipelines in "ANNEX 1: Deployed dataflows"

D4.2 Page 6 of 61
Data acquisition framework

2. GENERAL DATA ACQUISITION FRAMEWORK


2.1. Data acquisition and integration in context

This section provides the context of the data acquisition framework in relation to the
QROWD architecture.

Figure 1: QROWD general architecture

Figure 1 shows the QROWD architectural diagram as described in the QROWD


deliverable D8.1. The Data Acquisition Framework is providing support to gather
static and dynamic datasets from the different pilots and store that information either
in CKAN [CKAN] (static) or in the Context Broker [FIWARE Context Broker]
(dynamic) repositories.

2.2. Data acquisition in QROWD

D4.2 Page 7 of 61
Data acquisition framework

This deliverable presents the data acquisition framework for the collection of
QROWD-relevant data. According the definition given in deliverable D4.1, the
acquisition framework should allow the extraction of data from distributed and
heterogeneous sources and make it available for further usage in the project. A fully
identification and assessment of the datasets to be acquired by QROWD was done
in D4.1, and more specifically in the so-called Data Catalog, which was the result of
of D4.1: QROWD Live Data Catalog (LDC).

However, this document describes just the main data acquisition flows in QROWD.
These flows are related to the main static data collection, described in section 3 of
this document, and to the dynamic data collection, described in Section 4. Other
QROWD work packages are collecting external data from their own internal
purposes, such as:
● Real-time or mobile phone sensor data acquisition, descripted in D2.4 “iLOG”.
● News/Social Media streaming acquisition mechanisms, fully described in D4.4
“Crowdsourced multilingual data harvesting and extraction framework”.

Figure 2 shows a general view of the acquisition framework in the context of static
and dynamic data acquisition. As depicted in the figure, other data flows suchs as
coming form city sensors might be supported by the framework with the inclusion of
FIWARE IoT Agents.

Figure 2: Main QROWD acquisition processes explained in the document

For the static and dynamic data acquisition framework presented in this deliverable,
following mechanisms are put available to assist the QROWD data value chain:
● Set of acquisition and transformation components based on NiFi [Apache
NiFi], CKAN [CKAN], and FIWARE Context Broker [FIWARE Context

D4.2 Page 8 of 61
Data acquisition framework

Broker] mature open-sources technologies (See section 2.4.1, 2.4.2.2 and


2.4.2.3 respectively for more details),
● A set of guidelines for their usage and parameterization.

In the context of integration, a set of components which allow data format


transformation are also included as part of the framework. See Section 3.4 for more
details.

2.3. Intended users

The presented framework will provide users and processes data acquisition
mechanisms for collecting data of different nature (static and dynamic/streaming)
and from different sources (i.e. external services and repositories to QROWD or
internal processes) and make them available to other QROWD processes and other
actors.

In particular, the static data framework aims to provide users and processes
acquisition facilities such as:
● Uploading new datasets to the central data repository, CKAN. These datasets
may come from different sources of information:
○ Dataset coming from the Municipality of Trento services
○ VCE tool (See D3.2 Crowdsourcing services).
○ QROWD Fusion and Interlinking process from QROWD WP5
○ Available datasets such as: OpenStreetMap [OpenStreetMap]
○ Other sources
● Uploading new versions of existing datasets (versioning/backup)
● Transforming and uploading different formats of existing datasets. Some of
the formats managed in QROWD are:
○ FIWARE format for data integration/homogenization purposes.
FIWARE data models1 [FIWARE Data Models] are a set of
harmonized data model for smart cities applications.
○ GeoJSON [H. Butler et al, 2016] for visualization purposes
○ RDF2 [World Wide Web Consortium, 2014] format for analytical
purposes

On the other hand, the dynamic data acquisition framework uses the Orion Context
Broker (FIWARE) to persist contextual information about the state of several assets
of a city (i.e. parking lots). Consequently, the dynamic acquisition framework is
intended for processes with the need of receiving information about the real time
status of the city and do something with it, such as the visualization of real-time data
in the Municipality and Citizen dashboards.

2.4. Technology stack


2.4.1. NIFI

1 https://www.fiware.org/developers/data-models/
2 https://www.w3.org/RDF/

D4.2 Page 9 of 61
Data acquisition framework

Apache NiFi is a data platform developed by the Apache Software Foundation


created to move data between different systems both in real time and scheduled
manner with a really intuitive and easy to use graphic interface. It was designed to
tackle some of the most relevant issues in the industry when integrating data over
different systems.

NiFi’s base unit of work is called a FlowFile. A FlowFile is an encapsulation of the


data that will be processed with some extra attributes that define that data (like
filename, last update time…). These FlowFiles are modified and routed by using
processors.

This way of handling the data gives NiFi, out of the box, some useful features that
include:
● It is highly scalable, can be clustered and easily scaled horizontally so each
flowfile can be processed in a different system.3
● It allows to use back pressure4 in the system. The amount of data that is
stored because the target system is not able to handle it is configurable and it
also allows to tell the source system to hold the ingestion.
● Provenance5 of data is stored, so you can know what processors modified
that data.

In a nutshell, NiFi moves the data in the FlowFile from processor to processor
through the connections that link them. It can also be used to route the FlowFiles
through different paths depending on attributes or in the data itself.

On top of this engine, Apache NiFi provides the possibility to create your own custom
processors in case you have any functionality that cannot be covered with the
standard set of processors bundled with it. This way anyone can easily implement
the required interfaces and create a processor that can, for example, read or write
data to a CKAN system with NiFi but integrated in the NiFi platform without the
developer having to worry about coding most of the functionalities that NiFi offers
because they are added seamlessly to the custom processor.

2.4.2. OASC
2.4.2.1. Context Broker

As a part of FIWARE, Context Broker is one of the most important components of


this framework. As we have seen in the D4.1 “Data Catalog”, one of the major
impacts pursued by the QROWD project is the replicability/reusability of their results.
FIWARE is an open source platform component that allows us to communicate with
other technologies to create a better data ecosystem. This yields in an optimize data
flow that improvement the use of the data.

3 http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#clustering
4 https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#back-pressure
5 https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance

D4.2 Page 10 of 61
Data acquisition framework

Context Broker is based on the implementation of the NGSI9/NGSI10 Informational


model [NGSI-9/NGSI-10 information model]. “The NGSI Context Management
informational Specifications are defined by the OMA(Open Mobile Alliance)”.
The basic NGSI informational model are composed of :

● Entities. The virtual representation of an object of the real life. The entities
have an identifier and a type.
● Attributes. Information relative to feature of the entities, also can contains
metadata.
● Domain Attributes. A way to create sets of elements and group the attributes
with a similar logic.
● Context elements. “The data structure used for exchanging information about
entities”.
Context Broker is used in projects where you need to develop and deploy with data,
as an intermediate component to connect data producers and data consumers. The
main properties of this component can be resume as follows:
i) Register elements of the context.
ii) Manage these elements, consult and update
iii) Subscribe to these elements, that allows us to receive a notification
if for example the data content a change.
Context Broker is a good tool to work with a lot of different kind of data, and also are
adapted to use geolocation data and IOT (Internet Of Things) data, which are a kind
of data used to define what we call a “Smart City”.

2.4.2.2. CKAN

Several definitions of CKAN can be found in the official website 6, one of them states
that “CKAN is a fully-featured, mature, open source data portal and data
management solution” which can explain why CKAN is broadly used in Europe as
data catalog which allows data publishers sharing collection of data with general
users.

In particular, It provides features such as:


● An open-source platform that can be easily adapted and extended
● An intuitive web interface to manage datasets and organizations 7.
● API to allow developers interact with CKAN8
● A rich set of metadata for datasets annotations.
● Data storage for storage of structure data9 and data previsualization10
● Search and discovery, geospatial features...and more functionalities 11

6 https://ckan.org/about/
7 https://ckan.org/portfolio/publish-and-manage-data/
8 https://docs.ckan.org/en/latest/api/index.html
9 https://ckan.org/portfolio/datastore/
10 https://ckan.org/portfolio/visualization/
11 https://ckan.org/features/

D4.2 Page 11 of 61
Data acquisition framework

In addition to that, the Open and Agile Smart City initiative (OASC)[OASC] which
aims to provide best practices for the construction of smart cities system, rely on
CKAN as one of its main pillars. It propose CKAN platform as the base standard
platform for publication of static file datasets 12.

The recommendation of OASC initiative added to the before commented


functionalities has contribute to select CKAN as main repository for static dataset in
QROWD.

In the following section it can be found more details about the particular use of CKAN
in QROWD and the operative defined.

3. STATIC DATA ACQUISITION


This section presents the “Acquisition Components of Static Data” framework
(ASCD), specified in D4.1 and fully implemented in D4.2.
3.1. Description of data

The framework described in following sections encompasses the acquisition of


datasets of static nature.

As it was commented in D4.1, “static data, also known as data-at-rest, is data that
does not or barely change after its recording” and putting it in the context of
QROWD, it refers to those datasets related with the infrastructure of the city, i.e: bike
racks, paid parking zones, parking for disability people, bike lanes, schools, libraries,
e-car charging stations, etc.

There is, in the context of QROWD, an additional classification of datasets attending


the provenance of them:
● Datasets of type “Source”, original datasets coming from internal or external
data sources, i.e: datasets coming from the Municipality of trento, resulting
datasets coming from VCE component, OpenStreetMaps datasets, etc.
● Datasets of type “Intermediate”, datasets resulting from internal fusion
processes which make use of “source” datasets.
● Datasets of type “Final”, dataset resulting from internal validation processes
which make use of “intermediate” datasets.

12 http://oascities.org/wp-content/uploads/2016/02/Open-and-Agile-Smart-Cities-Background-
Document-3rd-Wave.pdf

D4.2 Page 12 of 61
Data acquisition framework

Figure 3: QROWD datasets

It is worth mentioning that the existence of different datasets classified as source,


intermediate or final does not affect the use of the data acquisition framework. It will
be transparent, from an acquisition point of view, whether a dataset comes from the
Municipality or from a fusion process. The only distinction between them will be the
publisher reflected in the name of the dataset and the tag associated to the dataset:
Type dataset”:{QROWD_source, QROWD_fusion, QROWD_official}

3.2. Data flow

Figure 4 shows the architecture to acquire, update and transform static datasets in
CKAN. They are the acquisition and transformation NiFI box what will be fully
explained in Section 3.4.

Figure 4: QROWD static data acquisition framework

D4.2 Page 13 of 61
Data acquisition framework

Based on the consideration described in Section 2.4.2.2, CKAN is used as main


repository for the management and storage of static dataset in QROWD. All the
acquisition and transformation processes of static data will sink into CKAN datasets.

Several acquisition processes take part in the process of uploading datasets into the
QROWD CKAN: Processes that collect datasets from the Municipality of Trento,
tools collecting crowdsourcing information to complete the information about
Municipality infrastructure, interlinking and fusion processes, validation processes,
etc.

On the other hand, additional processes will interact with CKAN to add different
formats of existing datasets: FIWARE transformations to integrate datasets into a
common format, GeoJSON transformations to make easier the visualization process
in the dashboard, etc.

Apart from uploading a new dataset or adding new formats to a existing one, the
framework also allows update datasets by means of adding a new version of an
existing format in an existing dataset. In that case, and in order to avoid
inconsistencies between different formats on the same dataset, the update of a
format must be complemented with the update of the rest of formats in the dataset,
to provide all the formats referencing same content/version. Therefore, the user
should arrange not only a format update but also a set of transformations to ensure
the matching between the different formats in a dataset version.

To enable this, upload and transformation processes must be linked in some cases.
Section 3.6 details an example of the integrated dataset acquisition and ulterior
transformations to FIWARE and /GeoJSON formats of some datasets from the
Municipality of Trento. This complex dataflow ensures the automatic triggering of all
the transformations needed when a new version of a dataset is uploaded. In the
example shown in Figure 18, when a new version of the impianti-sportivi dataset is
uploaded, not only the resource containing the raw information is updated, but also
the FIWARE and GeoJSON format are automatically update.

All the acquisition and transformations will be made through a set of NiFi-based
dataflows which make easier the integration with further components in the QROWD
general architecture.

3.3. CKAN General considerations


CKAN Structure: Datasets, resources, metadata and naming conventions

In QROWD, the CKAN repository will make use of following CKAN entities: Datasets,
resources, and metadata. Datasets entities will be used for the abstract concept of a
dataset, resources for the physical manifestations of a dataset in several formats,
and a set of basic metadata for datasets annotation.

Specifically, and focusing on a particular dataset, there will be a dataset containing


always the last version of a particular collection of data in CKAN. From now on this
dataset will be referred as “fresh” dataset. This “fresh” dataset will have the name

D4.2 Page 14 of 61
Data acquisition framework

DatasetID = {DatasetName} + {Publisher}. For instance, for the dataset providing


bike-racks data published by the municipality, there will be a dataset into CKAN with
id “Bike_Racks_MT”.

A dataset might have several formats for the same data, therefore a CKAN dataset
will have such a number of resource entries as formats availables for this particular
dataset. The naming convention will be: {DatasetID} + {_} + {FormatName} + {.ext}.
For instance, if we have a “FIWARE” version and a “Dashboard/GeoJSON” version
for the Bikes_MT dataset, the data acquisition framework will allow users have a
dataset in CKAN with name “Bike_Racks_MT” with two associated resources:
“Bike_Racks_MT_FIWARE.json” and “Bike_Racks_MT_GeoJSON.json”.

Finally, for the annotations associated to a dataset, there will be a metadata file, in
the form of a new resource, attached to the dataset with name: {DatasetID} + {_} +
{Metadata}+ {.ext}. Not all datasets will have a metadata file associated. In addition
to this metadata file, all datasets will be annotated with a set of basic metadata that
will be stored in the metadata fields of the dataset. These basic metadata are:
● Owner/Organization
● Visibility
● tag1: “Project: String”. By default “QROWD”
● tag2: “Type versioning”:{QROWD_lastVersion, QROWD_historical}
● tag3: “Type dataset”:{QROWD_source, QROWD_fusion, QROWD_official}
● tag4, …, tagN: “String”. Free tags.

Owner, Visibility, tag1, tag2, tag3 are mandatory.

D4.2 Page 15 of 61
Data acquisition framework

Figure 5: QROWD CKAN structure

Versioning

The acquisition framework will provide a “versioning” mechanism. The user will be
able to manage different versions of a particular dataset, that is, when the content of
a dataset is modified and the user want to store the new version without missing old
versions, the framework will automatically manage the update of a dataset
backuping the previous version. Therefore, when a new version of a existing dataset
arrives into the system:
1. A new dataset is created to backup all the existing formats (resources) of the
“fresh” dataset. The backup dataset will have same name than the “fresh”
dataset DatasetID + “timestamp”. It could be considered as a snapshot (a
copy of the state) of the “fresh” dataset at a particular point in time.
2. The “fresh” dataset (remember it contains always the last version) will be
overridden with the new version. In order to do that, the resource that
matches the format of the new version will be replaced by the new version
and the rest of outdated resources will be removed from the “fresh” dataset.
3. If any format should be recreated based on the new version, a notification
should trigger to other components in charge of creating new formats to
complete the “fresh” and last version of the dataset. The notification could be

D4.2 Page 16 of 61
Data acquisition framework

done through the Context Broker or any other mechanism managed by NiFi
processors.

Figure 6: QROWD CKAN versioning

Figure 6 shows an abstract idea of how the versioning takes place: One dataset
containing the last version (always the same) and N datasets for the historical
versions.

A CKAN repository example of the idea depicted above can be seen in the Figure 6.
Four CKAN datasets, for the same “zone-traffico-limitato” dataset, can be seen:
● “zone-traffico-limitato” containing the last version and,
● Three additional ones for storing old versions:
○ “zone-traffico-limitato20181005_124813”, a version freezed on
October, 5, 2018
○ “zone-traffico-limitato20181004_171050”, a version freezed on
October, 5, 2018 at different time.
○ “zone-traffico-limitato20181004_164049”, a version freezed on
October 4, 2018.

D4.2 Page 17 of 61
Data acquisition framework

Figure 7: QROWD CKAN example

Ownership

According to CKAN documentation: “Each dataset can belong to one or more


organization. And each organization controls access to its datasets”13, however from
QROWD acquisition point of view, each dataset belongs to one single organization,
the content creator. Therefore it is possible to find two datasets containing same
information (i.e: bike racks in Trento) but with different creator (assuming therefore
difference content) which result in two different datasets in CKAN.

As the name of a CKAN dataset usually reflects the information contained within it
and we use the name as unambiguously identificator, this normative suggests
adding the creator name as part of the dataset name.

13 https://docs.ckan.org/en/2.8/maintaining/authorization.html

D4.2 Page 18 of 61
Data acquisition framework

Figure 7: QROWD CKAN ownership

3.4. Core static data acquisition functionality

This section describes the NiFi core templates that implement the main
functionalities offered by the static data acquisition framework:
● CSDA-1: Upload/Update a dataset into CKAN, a generic data flow in charge
of taking a dataset (file) from a remote URL path and updating a CKAN
repository. The process contemplates the “versioning” for a dataset:
○ uploading it as a new dataset (if the datasets does not exist) or
○ updating a existing CKAN dataset and store the former content as
historical dataset.
● CSDA-2: Upload/Update a dataset into CKAN (with decompression), same
data flow than before but allowing the user download a dataset in the form of
a data compressed packaged and extract/filter files to be uploaded.
● CSDA-3: JSON-based transformations/versions of a dataset , by mean of this
functionality the user will be able to download a resource from a remote URL,
perform a set of transformations over the dataset and upload the result as a
new format of a existing dataset. The transformations will be in terms of
○ format: the original dataset will be transformed into JSON format;
○ structure: after the JSON transformation the user will have the
possibility of performing transformations in the JSON structure, that is:
change the name of the fields, create new JSON objects, or delete
existing ones.

D4.2 Page 19 of 61
Data acquisition framework

● CSDA-4: External-script transformation/format of a dataset , offers the user the


possibility of downloading a file from a remote URL, performing custom
transformations and uploading it as a new format for a particular dataset.

These core templates can be used in two different ways:

1. Isolated templates for basic functionalities. For instance to upload a


dataset, or a create and upload a new format of a existing dataset. Running
these functionalities would be just a matter of downloading the templates and
configuring them following the guidelines proposed in Section 3.5.
2. Combination of basic templates to create complex topologies for more
advanced functionalities. For instance to create a chain of transformations
and upload the resulting datasets or formats into CKAN by linking core
templates, as it is described in Section 3.6. To run these complex
functionalities, users should have some NiFi knowledge to establish, for
instance, the needed connections between the dataflows and processors.

Finally highlighting that CSDA flows are based on standards NiFi processors and
specific new NiFi processors developed in the context of QROWD. For each
processors will be indicated whether is standard or custom processors.
3.4.1. CSDA-1: Upload/Update a dataset into CKAN (from a URL)

One of the main functionalities of the acquisition framework is to provide users with
the possibility of automatically uploading/updating a dataset into CKAN including
versioning support. It takes the dataset, in the form of a digital file format, from a
remote URL, and upload it into a new dataset in case the dataset does not exist
previously in CKAN, or update an existing dataset in case it is an existing one.

According to the versioning control mechanism described in Section 3.3, updating a


existing dataset will imply creating a backup of the “fresh” dataset and substituting
the content of it with more recent information.

Examples of uses are:


● partners acquiring datasets from the municipality of Trento,
● process generating new datasets from crowdsourcing services,
● partners acquiring datasets from OpenStreetMap

Implementation with NiFi

Figure 8 shows different NiFi processors used in the dataflow.

D4.2 Page 20 of 61
Data acquisition framework

Figure 8: QROWD CSDA-1 NiFi processors

● “InvokeHTTP” (standar), processor in charge of downloading a file from a


remote URL.
● “Define_Package_Name” (standar), processor in charge of adding custom
attributes to the data flow. In particular, this processor will add:
○ an attribute to store the name of the dataset in CKAN and,
○ an attribute to give a name to the flow file. This name will be used as
the name of the resource (or file) to be added to CKAN.
● “CKAN_Package_Backup”(custom), processor in charge of backuping a
CKAN dataset. The backup will have same name than the copied dataset +
timestamp (backup time). The copied dataset remains as it is.
● “CKAN_Flowfile_Uploader” (custom), processor in charge of uploading a new
resource (file) into a CKAN dataset.

3.4.2. CSDA-2: Upload/Update a dataset into CKAN (from a URL +


decompress)

This process is almost the same than before with the additional functionality of
decompressing the downloaded file and filtering the extracted files to determine
which of them will go through the uploading process. The previous process allows
downloading exactly one file which will go through the process of uploading into
CKAN. This particular process will allow the user to download a file/package,
decompress the package and apply a filtering step to those files the user is
interested in. The resulting files from this descompres/filtering step will go through
the uploading process.

Examples of use:

D4.2 Page 21 of 61
Data acquisition framework

● Most of the datasets downloaded from the Municipality of Trento and used in
the QROWD dashboard.

Implementation with NiFi

Figure 9 shows different NiFi processors that take place in the dataflow.

Figure 9: QROWD CSDA-2 NiFi processors

This data flow includes one more processor to the NiFi processors described in the
previous section:
● “UnpackContent” (standard), a processor in charge of extracting files from a
compressed package and filtering the desired files that will be incorporated to
the workflow.

3.4.3. CSDA-3: JSON-based transformations + Update a dataset


into CKAN (without versioning)

D4.2 Page 22 of 61
Data acquisition framework

This dataflow offers the user very basic functionality of typical JSON-based DTL
(download, transform and load) process. It will allow to download a JSON format
dataset from a URL, to transform the original format into GeoJSON format, to
perform changes in the JSON structure, and to upload the resulting dataset into
CKAN.

Since this dataflow does not generate backup of any dataset, it produces new
formats of existing versions and no new versions, it’s worth mentioning that the
output of the transformation (the new format) is assumed to be uploaded as a new
resource (new file) into an existing dataset, in particular into the dataset from which
the file to be transformed was downloaded.

Examples of uses are:


● Original datasets transformed ino FIWARE data model.

Implementation with NiFi

Figure 10 shows different NiFi processors that take place in the dataflow.

Figure 10: QROWD CSDA-3 NiFi processors

D4.2 Page 23 of 61
Data acquisition framework

● “Invoke HTTP” (estandar), processor in charge of downloading a file from a


remote URL.
● “SplitJson” (estandar), processor in charge of splitting an input JSON array
into individual entities. Through a JSON path expression, the user will be able
to select the elements from the array to be splitted. From now on, there will be
a such number of flowfile as elements in the selected array.
● “EvaluateJsonPath” (estandar), processor in charge of putting JSON content
into attributes of the flowfile. The user can create as much new attributes as
different information want to store from the JSON content.
● “UpdateAttribute” (estandar), processor in charge of adding custom attributes
to the data flow. In particular, in particular this process will add:
○ an attribute to store the type of entities that are flowing through the
data flow
● “JoltTransformJSON” (estandar), processor in charge of allowing JSON to
JSON transformations based on a Jolt specification 14.
● “MergeContent” (estandar), processor in charge of joining different entities in
the dataflow into single one entity.
● “UpdateAttribute” (estandar), processor in charge of adding custom attributes
to the data flow. In particular, this processor will add:
○ an attribute to give a name to the flow file. This name will be used as
the name of the resource (or file) to be added to CKAN.
● “CKAN_Flowfile_Uploader” (custom), processor in charge of uploading a new
resource (file) into a CKAN dataset.

3.4.4. CSDA-4: External-script transformations + Update a dataset


into CKAN (without versioning)

This template offers the possibility to download a dataset (input file) from a remote
URL, perform a transformation over the dataset using external procedures and store
the resulting dataset (output file) into a CKAN repository. The transformations will be
command line-based executions of external script developed by the user.

As in the previous case, this dataflow does not generate a backup of any dataset. It
is worth mentioning that the output of the transformation (the new format) is
assumed to be uploaded as a new resource (new file) into an existing dataset, in
particular into the dataset from which the file to be transformed was downloaded.

Examples of use are:


● the dashboard transformation into GeoJSON format that does not need
generate new version of a dataset.

Implementation with NiFi

14 http://jolt-demo.appspot.com/#inception

D4.2 Page 24 of 61
Data acquisition framework

Figure 11: QROWD CSDA-4 NiFi processors

Figure 11 shows the different processors used in the data flow and the properties
that should be configured by the user for each of them:

● “Invoke HTTP” (estandar), processor in charge of downloading a dataset (file)


from the remote URL. The downloaded file will be the flow file.
● “UpdateAttribute” (estandar), processor in charge of adding custom attributes
to the data flow. In particular, this processor will add:
○ an attribute to give a name to the flow file. This name will be used as
the file name in the following processor.
● “PutFile” (estandar), processor in charge of writing a flow file into disk.
● “ExecuteStreamComand” (estandar), processor in charge of transforming an
input file into an output file by executing some external command line-based
script. As a suggestion, the processor can be configured with:
○ “ogr2ogr” as command line path (script) which performs simple
transformation between file formats.
○ “-f”,”outputFormat”;outputFile;inputFile;-s_srs;EPSG:25832;-
t_srs;EPSG:4326 as command arguments. The outputFile and the
inputFile will be produced and picked up respectively from the directory
defined in PutFile.

D4.2 Page 25 of 61
Data acquisition framework

● “FetchFile” (estandar), processor in charge of fetching a local file as flow file .


● “UpdateAttribute” (estandar), processor in charge of adding custom attributes
to the data flow. In particular, this processor will add:
○ an attribute to give a name to the flow file. This name will be used as
the name of the resource (or file) to be added to CKAN.
● “CKAN_Flowfile_Uploader” (custom), processor in charge of uploading a new
resource (file) into a CKAN dataset.

3.5. Guided procedures

This section provides a set of recommendations and basic instructructions to


configure and create their own data flows based on the templates provided by the
Static Data Acquisition Framework (CSDA).

General recommendations

● An API Key-based authorization in required for operating with CKAN API. For
getting an user and API Key, please register at http://CKAN_host/user/register
● To configure a processor in NiFi double-click on it and go to the “properties”
tab.

Figure 12: NiFi processor configuration

D4.2 Page 26 of 61
Data acquisition framework

CSDA-1: Upload/Update a dataset into CKAN (from a URL)

Figure 13 shows the NiFi GUI to add a new element to the NiFi dataflow based on
previous templates. In this case the user should select “Upload_CKAN” template and
below you can find the instructions to run it.

Figure 13: CSDA-1 NiFi template

1. Step 1: Go to “InvokeHTTP” processor, go to “properties” tab and fulfill:


- Mandatory:
- “Remote URL”, fill with the remote URL form where to get the
dataset.
- Others. Apart from the mandatory properties, other properties exist in
the processor and they can remain as they are or be configured by the
user. Some of them are:
- “Basic Authentication Username”, “Basic Authentication
Password” can be used for downloading with required
authentication.
- In “Scheduling” tab the user can select a “Timer driven” strategy
and schedule the download task (and therefore the entire
workflow).
2. Step 2: Go to “Define_Package_Name processor”, go to “properties” tab and
fulfill:
- Mandatory

D4.2 Page 27 of 61
Data acquisition framework

- “CKAN_package_name”, the name of the dataset in CKAN.


Only accept letters, hyphen, and lowercase.
- “filename”, the name of the flowfile. This name will be used by
“CKAN_Flowfile_Uploader” to set the name of the resource (or
file) to be added to CKAN. It is a fixed string.
3. Step 3: Got to “CKAN_Package_Backup” processor, go to “properties” tab
and fulfill:
- Mandatory
- “CKAN Url”, URL of the CKAN repository
- “File Api_key”, API Key for operating with CKAN API.
- “Name of the package to backup”, Name of the dataset to be
backuped. If the dataset does not exist, nothing happens. If the
dataset exist, a copy of the specified dataset will be created.
According to the normative defined in Section 3.3, just “fresh”
datasets are subject to be backuped. It should be lowercase.
- “Comma-separated Tag List”, the user might specify different
alphanumeric tags. As it was explained in Section 3.3 at least
three tags should be fulfilled to backup a new dataset:
- One tag indicating the project name: “String”. By default
and in the context of the QROWD project the user must
set always: “QROWD”
- One tag indicating the type of dataset. As we are
configuring the backup processor, the user must set
always: “QROWD_historical”
- One tag indicating the type of dataset (from a use point of
view). In the context of QROWD the user has to select
one of the following values:{QROWD_source,
QROWD_fusion, QROWD_official}.
- Example: “QROWD, QROWD_historical,
QROWD_source”.
4. Step 4: Go to “CKAN_Flowfile_Uploader” processor, go to “properties” tab
and fulfill:
- Mandatory
- “CKAN Url”, URL of the CKAN repository
- “File Api_key”, API Key for operating with CKAN API.
- “Organization id”, the creator/owner of the dataset. This property
is always needed, the first time to add the organization to a
dataset, and following times to check if the organization passed
corresponds to the ownership of dataset in CKAN. It should be
lowercase.
- “Name_of_the_package”, the CKAN dataset that will host the
new version. According to the normative defined in Section 3.3,
just “fresh” datasets are subject to keep the last version. It
should be lowercase.
- “Package visibility”, the accessibility of the dataset: ”public”, if
anyone can access to the dataset without restrictions, or
“private” restricted to some users.
- “Comma-separated Tag List”, here the user might specify
different alphanumeric tags. As it was explained in Section 3.3
at least three tags should be fulfilled to upload a new dataset:

D4.2 Page 28 of 61
Data acquisition framework

- One tag indicating the project name: “String”. By default


and in the context of the QROWD project the user must
set always: “QROWD”.
- One tag indicating the type of dataset. As we are
configuring the uploader processor, the user must set:
“QROWD_lastVersion”
- One tag indicating the type of dataset (from a use point of
view). In the context of QROWD the user has to select
one of the following values: {QROWD_source,
QROWD_fusion, QROWD_official}
- Example: “QROWD, QROWD_lastVersion,
QROWD_fusion”.

CSDA-2: Upload/Update a dataset into CKAN (from a URL + decompress)

Figure 14 shows the NiFi GUI to add a new element to the NiFi dataflow based on
previous templates. In this case the user should select
“Upload_CKAN_with_decompresion” template and below you can find the
instructions to run it.

Figure 14: CSDA-2 NiFi template

In this case the user should also configure following procesor (before Step 2 in the
previous template):

D4.2 Page 29 of 61
Data acquisition framework

1. Step 1: Go to “InvokeHTTP” processor, go to “properties” tab and fulfill:


- Mandatory:
- “Remote URL”, fill with the remote URL form where to get the
dataset.
- Others. Apart from the mandatory properties, other properties exist in
the processor and they can remain as they are or be configured by the
user. Some of them are:
- “Basic Authentication Username”, “Basic Authentication
Password” can be used for downloading with required
authentication.
- In “Scheduling” tab the user can select a “Timer driven” strategy
and schedule the download task (and therefore the entire
workflow).
2. Step 2: Go to “UnpackContent” processor, go to “properties” tab and fulfill
- Mandatory:
- “Packaging format”, type of compressed package. By default the
“mime.type” value can automatically detect and decompress the
file.
- “File filter”, the user should specify in which files is interested
from the extracted ones. It will be setting by Indicating the
extension(s) of the file(s) by means of a regular expression such
as: [.gml|.xml] or [.gml|.xml|.zip]
3. Step 3: Go to “Define_Package_Name processor”, go to “properties” tab and
fulfill:
- Mandatory
- “CKAN_package_name”, the name of the dataset in CKAN.
Only accept letters, hyphen, and lowercase.
4. Step 4: Got to “CKAN_Package_Backup” processor, go to “properties” tab
and fulfill:
- Mandatory
- “CKAN Url”, URL of the CKAN repository
- “File Api_key”, API Key for operating with CKAN API.
- “Name of the package to backup”, Name of the dataset to be
backuped. If the dataset does not exist, nothing happens. If the
dataset exist, a copy of the specified dataset will be created.
According to the normative defined in Section 3.3, just “fresh”
datasets are subject to be backuped. It should be lowercase.
- “Comma-separated Tag List”, the user might specify different
alphanumeric tags. As it was explained in Section 3.3 at least
three tags should be fulfilled to backup a new dataset:
- One tag indicating the project name: “String”. By default
and in the context of the QROWD project the user must
set always: “QROWD”
- One tag indicating the type of dataset. As we are
configuring the backup processor, the user must set
always: “QROWD_historical”
- One tag indicating the type of dataset (from a use point of
view). In the context of QROWD the user has to select
one of the following values:{QROWD_source,
QROWD_fusion, QROWD_official}.

D4.2 Page 30 of 61
Data acquisition framework

-
Example: “QROWD, QROWD_historical,
QROWD_source”.
5. Step 5: Go to “CKAN_Flowfile_Uploader” processor, go to “properties” tab
and fulfill:
- Mandatory
- “CKAN Url”, URL of the CKAN repository
- “File Api_key”, API Key for operating with CKAN API.
- “Organization id”, the creator/owner of the dataset. This property
is always needed, the first time to add the organization to a
dataset, and following times to check if the organization passed
corresponds to the ownership of dataset in CKAN. It should be
lowercase.
- “Name_of_the_package”, the CKAN dataset that will host the
new version. According to the normative defined in Section 3.3,
just “fresh” datasets are subject to keep the last version. It
should be lowercase.
- “Package visibility”, the accessibility of the dataset: ”public”, if
anyone can access to the dataset without restrictions, or
“private” restricted to some users.
- “Comma-separated Tag List”, here the user might specify
different alphanumeric tags. As it was explained in Section 3.3
at least three tags should be fulfilled to upload a new dataset:
- One tag indicating the project name: “String”. By default
and in the context of the QROWD project the user must
set always: “QROWD”.
- One tag indicating the type of dataset. As we are
configuring the uploader processor, the user must set:
“QROWD_lastVersion”
- One tag indicating the type of dataset (from a use point of
view). In the context of QROWD the user has to select
one of the following values: {QROWD_source,
QROWD_fusion, QROWD_official}
- Example: “QROWD, QROWD_lastVersion,
QROWD_fusion”.

CSDA-3: JSON-based transformations + Update a dataset into CKAN

Figure 15 shows the NiFi GUI to add a new element to the NiFi dataflow based on
previous templates. In this case the user should select
“JSON_based_transformation_upload_CKAN” template and below you can find the
instructions to run it.

D4.2 Page 31 of 61
Data acquisition framework

Figure 15: CSDA-3 NiFi template

1. Step 1: Go to “InvokeHTTP” processor, go to “properties” tab and fulfill:


- Mandatory:
- “Remote URL”, fill with the remote URL form where to get the
dataset.
- Others. Apart from the mandatory properties, other properties exist in
the processor and they can remain as they are or be configured by the
user. Some of them are:
- “Basic Authentication Username”, “Basic Authentication
Password” can be used for downloading with required
authentication.
- In “Scheduling” tab the user can select a “Timer driven” strategy
and schedule the download task (and therefore the entire
workflow).
2. Step 2: Go to “SplitJson” processor, go to “properties” tab and fulfill:
- Mandatory:
- “JsonPath Expression”: Introduce a JSON path expression to
point out how to split the input JSON. i.e: by setting “$.*” this
processor will split the flow file (at this point, just a single one)
into as many flow files such elements in the top level array in the
JSON input. Make use of 15
3. Step 3: Go to “EvaluateJsonPath” processor, go to “properties” tab and add
(button +) attributes to store JSON content from each flow file. This
information can be needed for further processors
15 http://jsonpath.com/

D4.2 Page 32 of 61
Data acquisition framework

- Mandatory
- “id”. A JSON path expression that retrieve the id of the JSON
object. Needed for the Merge.
- Other
- add custom user attributes to flowfile attributes. The value for
these attributes will be retrieved from the JSON object
processed as flowfile. These attributes can be used in the
“JoltTransformationJSON” processor.
4. Step 4: Go to “UpdateAtribute” processor, go to “properties” tab and fulfill:
- Other.
- “type”, fill the “type”, a string to indicate a shared type for all the
elements. It can be useful to indicate the FIWARE type when the
transformation is from JSON structure to FIWARE structure.
5. Step 5: Go to “JoltTransformationJSON” processor, go to “Advance” (button
left side) and mandatorily introduce the JOLT specification which will
transform an input JSON structure into another JSON structure. A reference
can be find here16. A new flowfile is created with the new structure.
6. Step 6: “MergeContent” processor does not need to be parameterized. After
this processor all the flowfiles will be merge into a single one. Internally the
processor will make use of the attribute “id” defined in “EvaluateJsonPath”.
7. Step 7: Go to “UpdateAttribute” processor, go to “properties” tab and fulfill:
- Mandatory:
- “filename”, the name of the flowfile. This name will be used by
“CKAN_Flowfile_Uploader” to set the name of the resource (or
file) to be added to CKAN. It is a fixed string.
8. Step 8: Go to “CKAN_Flowfile_Uploader” processor, go to “properties” tab
and fulfill:
- Mandatory
- “CKAN Url”, URL of the CKAN repository
- “File Api_key”, API Key for operating with CKAN API.
- “Organization id”, the creator/owner of the dataset. This property
is always needed, the first time to add the organization to a
dataset, and following times to check if the organization passed
corresponds to the ownership of dataset in CKAN. It should be
lowercase.
- “Name_of_the_package”, the CKAN dataset that will host the
new format. According to the normative in Section 3.3, the
output of the transformation (the new format) is assumed to be
uploaded as a new resource (new file) into an existing dataset,
in particular into the dataset from which the file to be
transformed was downloaded. It should be lowercase.
- “Package visibility”, the accessibility of the dataset: ”public”, if
anyone can access to the dataset without restrictions, or
“private” restricted to some users.
- “Comma-separated Tag List”, here the user might specify
different alphanumeric tags. As it was explained in Section 3.3
at least three tags should be fulfilled to upload a new dataset:

16 http://jolt-demo.appspot.com/#inception

D4.2 Page 33 of 61
Data acquisition framework

- One tag indicating the project name: “String”. By default


and in the context of the QROWD project the user must
set always: “QROWD”.
- One tag indicating the type of dataset. As we are
configuring the uploader processor, the user must set:
“QROWD_lastVersion”
- One tag indicating the type of dataset (from a use point of
view). In the context of QROWD the user has to select
one of the following values: {QROWD_source,
QROWD_fusion, QROWD_official}
- Example: “QROWD, QROWD_lastVersion,
QROWD_fusion”.

CSDA-4: External-script transformations + Update a dataset into CKAN

Figure 16 shows the NiFi GUI to add a new element to the NiFi dataflow based on
previous templates. In this case the user should select
“External_script_transformation_upload_CKAN” template and below you can find the
instructions to run it.

Figure 16: CSDA-4 NiFi template

D4.2 Page 34 of 61
Data acquisition framework

1. Step 1: Go to “InvokeHTTP” processor, go to “properties” tab and fulfill:


- Mandatory:
- “Remote URL”, fill with the remote URL form where to get the
dataset.
- Others. Apart from the mandatory properties, other properties exist in
the processor and they can remain as they are or be configured by the
user. Some of them are:
- “Basic Authentication Username”, “Basic Authentication
Password” can be used for downloading with required
authentication.
- In “Scheduling” tab the user can select a “Timer driven” strategy
and schedule the download task (and therefore the entire
workflow).
2. Step 2: Go to “UpdateAttribute” processor, go to “properties” tab and fulfill:
- Mandatory:
- “filename”, fill the name of the flowfile, that will be needed for the
“PutFile” processor to set file in the directory. It is a fixed string.
3. Step 3: Go to “PutFile”, processor, go to “properties” tab and fulfill:
- Mandatory
- “directory”, set the directory where the flowfile payload will be
downloaded in the form of a file. It will be the input file for the
next processor.
4. Step 4: Go to “ExecuteStreamCommand” processor, go to “properties” tab
and fulfill:
- Mandatory:
- “Command Path”, the needed command to run the script. As a
matter of example: “ogr2ogr” a library which is included and is
able to transform from any format into JSON.
- “Command Argument”, the needed arguments to run the
command. As a matter of example for the ogr2ogr library: -f,
“output desired format”;”output file”;”input file”;-
s_srs;EPSG:25832;-t_srs;EPSG:4326
5. Step 5: Go to “FetchFile” processor, go to “properties” tab and fulfill:
- Mandatory:
- “File To Fech”, set the “output file”. Must be the same of the
output file obtained in the previous processor.
6. Step 6: Go to “UpdateAttribute” processor, go to “properties” tab and fulfill:
- Mandatory:
- “filename”, the name of the flowfile. This name will be used by
“CKAN_Flowfile_Uploader” to set the name of the resource (or
file) to be added to CKAN. It is a fixed string.
7. Step 7: Go to “CKAN_Flowfile_Uploader” processor, go to “properties” tab
and fulfill:
- Mandatory
- “CKAN Url”, URL of the CKAN repository
- “File Api_key”, API Key for operating with CKAN API.
- “Organization id”, the creator/owner of the dataset. This property
is always needed, the first time to add the organization to a
dataset, and following times to check if the organization passed

D4.2 Page 35 of 61
Data acquisition framework

corresponds to the ownership of dataset in CKAN. It should be


lowercase.
- “Name_of_the_package”, the CKAN dataset that will host the
new format. According to the normative in Section 3.3, the
output of the transformation (the new format) is assumed to be
uploaded as a new resource (new file) into an existing dataset,
in particular into the dataset from which the file to be
transformed was downloaded. It should be lowercase.
- “Package visibility”, the accessibility of the dataset: ”public”, if
anyone can access to the dataset without restrictions, or
“private” restricted to some users.
- “Comma-separated Tag List”, here the user might specify
different alphanumeric tags. As it was explained in Section 3.3
at least three tags should be fulfilled to upload a new dataset:
- One tag indicating the project name: “String”. By default
and in the context of the QROWD project the user must
set always: “QROWD”.
- One tag indicating the type of dataset. As we are
configuring the uploader processor, the user must set:
“QROWD_lastVersion”
- One tag indicating the type of dataset (from a use point of
view). In the context of QROWD the user has to select
one of the following values: {QROWD_source,
QROWD_fusion, QROWD_official}
- Example: “QROWD, QROWD_lastVersion,
QROWD_fusion”.

3.6. Application of static core templates: Integrated


acquisition and FIWARE/GeoJSON transformation of
Municipality of Trento datasets

The static data acquisition framework provides a set of core NiFi templates that can
be used and connected between them in order to create more complex workflows.
As part of the framework, and as a example of utilization of core templates, a set of
more advanced ad-hoc and parameterized workflows will be put available for the
project.

These workflows will be specifically focused in a integrated acquisition and


transformations (FIWARE, GeoJSON) of Municipality of Trento datasets required to
be visualized in the dashboard. In particular, each workflow will be composed of
following sub-workflows based on the core templates:

● Sub-workflow 1: Original datasets uploaded to CKAN.


● Sub-workflow 2: Original datasets transformed into GeoJSON format
● Sub-workflow 3: GeoJSOn dataset dataset transformed into FIWARE dataset

D4.2 Page 36 of 61
Data acquisition framework

Figure 17: NiFi template- based integrated acquisition and transformation

For the creation of these workflows it is assumed some basic NiFi knowledge, since
it is necessary to add some NiFi components (ports, connectionS, ETC.) to/between
the existing templates.

Figure 18 shows an example of the workflow. In particular the workflow is for the
acquisition of “impianti-sportivi” dataset.

D4.2 Page 37 of 61
Data acquisition framework

Figure 18: impianti-sportivi dataflow

● sub-workflow “impianti-sportivi Acquisition”, it is built from CSDA-1 template


and aims to:
○ download the “Sport facilities” dataset from the Municipality services 17
and
○ upload it into CKAN in the resource “impianti_sportivi.gml” of a dataset
named “impianti-sportivi-mt”.
● sub-workflow “impianti-sportivi GeoJSON Transformation”, it is built from
CSDA-4 template and aims to:
○ download the “impianti_sportivi.gml” format from the dataset “impianti-
sportivi-mt”,
○ transform it into GeoJSON format, and
○ upload the new format as new resource
“impianti_sportivi_GEOJSON.json” in the “impianti-sportivi-mt” dataset.
● sub-workflow “impianti-sportivi FIWARE Transformation”, it is built from
CSDA-3 template and aims to:
○ download the “impianti_sportivi_GEOJSON.json” format from the
dataset “impianti-sportivi-mt”,
○ transform it into a FIWARE format and
○ upload the new format as new resource
“impianti_sportivi_FIWARE.json” in the “impianti-sportivi-mt” dataset.

Each of the sub-workflow now incorporates (input or/and output) ports, as new NiFi
component, to enable the connection between them.

17 http://webapps.comune.trento.it/cartografia/gis/dbexport?
db=base&sc=istruzione_sport&ly=impianti_sportivi&fr=gml

D4.2 Page 38 of 61
Data acquisition framework

3.7. Deployment and release of the framework


Deployment

The following NiFi components has been deployed into the QROWD server in
Leipzig hosted by InfAI with partner number 7:
● Basic Templates
○ CSDA-1: Upload/Update a dataset into CKAN (from a URL)
○ CSDA-2: Upload/Update a dataset into CKAN (from a URL +
decompress)
○ CSDA-3: JSON-based transformations + Update a dataset into CKAN
○ CSDA-4: External-script transformations + Update a dataset into CKAN
● N advanced workflows. Please refer to table in ANNEX 1: Deployed dataflows
to see a relation of dataflows deployed.
● Bundles

Release

The CSDA templates and the specific new NiFi processors developed are released
under licence Apache 2.0 on the following QROWD git repositories:
- CSDA templates18, collection of templates created for the QROWD project to
be used in NiFi.
- NiFi CKAN processors19, custom Apache Nifi processor to upload files to
CKAN.

3.8. Static Requirements Validation

D4.1 defined a set of requirements that the ACSD framework should accomplish.
This section aims at assessing the degree of fulfillment of those requirements, as
depicted in Table 1.

Table 1: Static requirements validation (FR)


Requiremen
ID Description Notes Validation Notes V
t name
i.e: Able to
gets
Dataset The ACSD should datasets
DA-101 consolidatio collect datasets from from several A
n several data sources portals, from
open
services...
i.e:Retrieve Any
The ACSD should
data directly resource
Dataset handle different
DA-102 from files or PA accessible
access type ways of accessing to
from web from a
the datasets
API,... remote URL
DA-103 Dataset The ACSD should i.e: An A CKAN

18 https://github.com/QROWD/NiFi-templates
19 https://github.com/QROWD/nifiCkanProcessor

D4.2 Page 39 of 61
Data acquisition framework
intermediate
provide an
repository
intermediate
(CKAN-like)
repository or dataset
that feed the
portal to store (can
storage RDF- repository
store the resource
transformati
internally, or store it
on
simply as a link)
component(
acquired datasets
s)
Each
dataset will
i.e:
have basic
Information
tags for
about the
owner
Information or data: title,
Metadata (publisher),
DA-104 “metadata” about the publisher, A
about data visibility, and
data date,
a associated
schema file,
metadata
provenance.
file for
..
provenance
metadata
The
i.e: RDF-
Backend
transformati
The ACSD should consumer
on
Access provide means to system will
component
DA-105 mechanism allow subsequent PA allow
should be
s systems to browse retrieve
able to
and find the data datasets
access the
based on its
data
identifier
The entry
Ideally, the ACSD point NiFi
should pull processor
data from the source i.e.Process (InvokeHTT
Batch job
DA-106 systems at executed as A P) of each
execution
scheduled intervals a batch job acquisition
using batch template
components. allow
scheduling
i.e: For each
new dataset
For each
define the
acquisition
process to
A set of first steps template it is
Definition of import it
toward starting up provided a
datasets including:
DA-107 the acquisition A guided with
acquisition access way,
process should be a detailed
process transformati
defined set of steps
ons,
to be
metadata,
followed
timeliness,
etc...
DA-108 Versioning The ACSD should be i.e: Update A A versioning
able to track different datasets mechanism
versions of datasets with new is provided

D4.2 Page 40 of 61
Data acquisition framework
and
implemente
releases d in the
acquisition
framework
i.e: Some of
Dataset in
the used
Security Datasets can be CKAN can
DA-009 datasets are A
restrictions public or private be public or
the private
private
usage
A “JSON-
based
i.e: A simple transformati
extraction, on and
transformati upload
For some datasets on and CKAN”
ETL the ACSD could loading template is
DA-110 A
process implement classical process provided to
ETL process could be allow basic
applied to extract,
specific transformati
datasets on and
upload
process
The
“External-
script
The ACSD should be transformati
able to on and
File format i.e: XML,
DA-111 access/transform A upload
access type JSON, GML
data from several file CKAN” allow
format transformati
ons between
different
formats

Non-functional requirements

Table 2: Static requirements validation (NFR)


Requiremen
ID Description Notes Validation Notes V
t name
i.e: The load Each of the
of a new acquisition
dataset templates
The ACSD should be
should be allows the
able to incorporate
DA-112 Flexibility carried with A incorporatio
unknown, new or
the less n of new
changing datasets.
possible dataset just
number of by adjusting
changes parameters.

D4.2 Page 41 of 61
Data acquisition framework

Based on this, we can state that the majority of the requirements of the ACSD have
been accomplished.

4. DYNAMIC DATA ACQUISITION


4.1. Datasets

The framework described in following sections provides ways to acquire datasets of


dynamic or streaming nature.

We defined real-time or streaming data for the purposes of QROWD in D4.1 as, “ a
type of dynamic data with a very high or continuously rate of change and usually
with the need to be consumed immediately after its production”. Therefore, in the
case of the pilots of QROWD, a streaming dataset can be seen as collections of data
that provide information about the current status of the city, such as real-time
availability of bike-sharing, real time status of underground parking, etc.

Deliverable D4.1 presented a catalog of datasets of real-time or streaming nature to


be used in QROWD pilots. The data acquisition framework intends to give support to
that type of datasets.

4.2. Data flow

Figure 19 shows the main elements needed to acquire, update and transform
dynamic datasets (i.e. Underground parking status, Bike-racks for bike-sharing
status) in the FIWARE Orion Context Broker using NiFi.

Figure 19: QROWD dynamic data acquisition framework

D4.2 Page 42 of 61
Data acquisition framework

In the case of dynamic acquisition, a single process will be in charge of:


● retrieving real time information for available services,
● transforming the original data into FIWARE entities,
● and uploading and updating entities into the Context Broker through the well-
defined FIWARE NGSI API20.

The entire acquisition process is made through a set of NiFi-based processors which
make easier the integration with further components of the QROWD general
architecture (Figure 1).

4.3. Context Broker general considerations

The main consideration the user should take into account to operate with the
component provided is that all the entities posted or updated into the Context Broker
should fit the FIWARE data models 21 where possible. For instance, FIWARE
provides data models for parking, such as off-street parking 22, that are of interest to
represent specific datasets managed by QROWD.

4.4. Core dynamic data acquisition functionality


4.4.1. CDDA-5: JSON based transformation + Update to Context
broker

This NiFi process provides the means to take a given dataset in JSON format and
transform it to the FIWARE model and update it to the Context Broker as shown in
Figure 20.

20 https://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/FI-WARE_NGSI-
10_Open_RESTful_API_Specification
21 https://www.fiware.org/developers/data-models/
22 https://fiware-datamodels.readthedocs.io/en/latest/Parking/OffStreetParking/doc/spec/index.html

D4.2 Page 43 of 61
Data acquisition framework

Figure 20: QROWD CDDA-1 NiFi processors

The dataflow shown in Figure 20 consists of two main processor groups::


- “ConvertToFiware”, group of processors in charge of transforming a JSON
format into FIWARE data model. This processor group flow is shown in Figure
21.
- “Copy of FiwareRESTAPIHandler”, a group of components in charge of
uploading entities to FIWARE context broker. This processor group flow is
shown in Figure 22.

D4.2 Page 44 of 61
Data acquisition framework

Figure 21: QROWD CDDA-1 Convert to FIWARE NiFi processors

The logic of the dataflow shown in Figure 21 is quite similar to the initial steps
presented in the static template “JSON-based transformations + Update a dataset
into CKAN” in Sections 3.4 and 3.5. Refer back to those sections for further details.

D4.2 Page 45 of 61
Data acquisition framework

Figure 22: QROWD CDDA-1 Upload to Context broker NiFi processors

The detail of the dataflow shown in Figure 22 is the following;


- “Evaluate JSON path” (estandar), processor in charge of getting the id
of the entity.
- “Invoke HTTP” (estandar), processor in charge of checking if already
exist an entity in the Context Broker with this id. If the answer is:
- “Yes”, the following processors will be in charge of updating the
existing entity in the context broker:
- “RemoveIDForUpdate” (estandar), processor in charge of
removing the “id” from the entity,
- “PostUpdateToFiware” (estandar), processor in charge of
updating an existing FIWARE entity (sent in the request
body) into the Context Broker by means of NGSI API.
The id is sent as path param.

D4.2 Page 46 of 61
Data acquisition framework

- “No”, the following processors will be in charge of posting a new


entity in the context broker:
- “JoltTransformation” (estandar), processor in charge of
adding the FIWARE type to the entity.
- “POSTNewEntityToFiware” (estandar), processor in
charge of uploading a new FIWARE entity (sent in the
request body) into the context broker by means of NGSI
API. The id is sent in the entity itself.
4.5. Guided Procedure

Figure 23: CDDA-1 NiFi template

2. Step 1: Go to “ConvertToFiware” group and follow same 5 first steps than


section CSDA-3: JSON-based transformations + Update a dataset into CKAN.
3. Step 2: Go to “Copy of FiwareRESTAPIHandler” group and by double clicking
fill the following steps:
- Step 3: Go to “EvaluateJsonPath” processor, go to “properties” tab
fulfill:
- Mandatory
- “id”. A JSON path expression that retrieve the id of the
new context broker entity. Needed for checking if the
entity already exist in the context broker .
- In case of an update:
- Step 4: Go to “Invoke HTTP” processor, go to “properties” tab
and fulfill:
- Mandatory:
- In “Remote URL” property, replace Context_Broker
in bold: <Context_broker>/v2/entities/${ID} with
the IP where the Context Broker is running.

D4.2 Page 47 of 61
Data acquisition framework

- Step 5: Go to “RemoveIDForUpdate” processor, go to


“properties” tab and fulfill:
- Mandatory
- “Search Value”, fill in it with the regular expression
that remove the id from the entity.
- Step 6: Go to “PostUpdateToFiware” processor, go to
“properties” tab and fulfill:
- Mandatory
- In “URL” property, replace Context_Broker in bold
in <Context_broker>/v2/entities/${ID}/attrs?
options=keyValues with the IP where the Context
Broker is running.
- In case of a new upload:
- Step 7: Go to “JoltTransformationJSON” processor, go to
“Advance” (button left side) and mandatorily introduce the JOLT
specification that will transform an input JSON structure into
another JSON structure. A reference can be find here 23. A new
flowfile is created with the new structure.
- Step 8: Go to “PostNewEntityToFiware” processor, go to
“properties” tab and fulfill:
- Mandatory
- In “URL” property, replace Context_Broker in bold
in <Context_broker>//v2/entities?
options=keyValues with the IP where the Context
Broker is running.

4.6. Deployment of the framework

Deployment

Following components will be deployed into the QROWD servers:


- Basic Templates
- CDDA-5: JSON-based transformations + Update to Context broker
- Two applications of the previous template
- Bike-Sharing acquisition dataflow
- Underground parking availability

Release

The CSDA templates developed are released under licence Apache 2.0 on the
following QROWD git repository:
- CSDA templates24, collection of templates created for the QROWD project to
be used in NiFi.

23 http://jolt-demo.appspot.com/#inception
24 https://github.com/QROWD/NiFi-templates

D4.2 Page 48 of 61
Data acquisition framework

4.7. Dynamic Requirements Validation

D4.1 defined a set of requirements to accomplish by the Dynamic acquisition


framework. Table 3 and Table 4 show the assessment of the requirements’
coverage.

Functional requirements

Table 3: Dynamic requirements validation (FR)


Requirement
ID Description Notes Validation Notes V
name
i.e: Put
available to The CDDA-
QROWD 5 template
platform last allow to
Transfer the latest
state of retrieve
Acquisition state of dynamic
weather information
DA- of last data from external
forecast A in real time
201 measured data sources to a
entities and store
data central
obtained the last
broker/repository.
from the status in the
Open Data Context
Trentino Broker
portal
i.e: The The entry
system point NiFi
should processor
The acquisition update and (InvokeHTT
tool(s) should be put available P) of CDDA-
DA- Currency
able to provide the weather A 5 allows
202 factor
desired latency in forecast scheduling
updating the data information (different
for Trento latency in
each 15 updating the
minutes. dataset)
The entry
point NiFi
processor
i.e: every
(InvokeHTT
fifteen
P) of CDDA-
minute,
Timeliness Different datasets 5 allows
DA- hourly, 2 per
factor may have different A selecting
203 day, daily,
rate of change. different
15 days,
elapsed
quarterly,
amount of
yearly.
time to
execute the
updates
DA- File format The system should i.e: XML, NA Eventually
204 access type be able to acquire CSV-GTF, there was
data in different file CSV, GML/ not need in
format or access SHP/KML/D the

D4.2 Page 49 of 61
Data acquisition framework
schemes. Such as requirement
flat files, database XF, JSON, s of the
dumps or SQL SQL dumps, project to
interfaces. The list of SQL acquire
access types is interfaces, datasets
driven by the etc. from
characteristics of databases
datasets.
The system should
be able quickly and
Variety of
DA- easily to integrate A
data
205 and expose data
sources
from a variety of data
sources using APIs.

Non-functional requirement

Table 4:Dynamic requirements validation (NFR)


Requirement
ID Description Notes Validation Notes V
name
The system, or part The CDDA-
of it, should be i.e: NGSI
5 template
DA- OASC compliant with NGSI API,
A is compliant
206 compliant API standard and FIWARE
with the
FIWARE data data model
NFSI API
models
The
acquisition
dynamic
Building a scalable template is
infrastructure able to i.e: Import a
DA- built with
Scalability handle huge dump file of A
207 NiFi25, a
datasets used in 19,5 gb
scalable
QROWD. technology
for scalable
data flows
i.e: The load Each of the
of a new acquisition
The system should dataset templates
be able to should be allows the
DA-
Flexibility incorporate carried with A incorporatio
208
unknown, new or the less n of new
changing datasets. possible dataset just
number of by adjusting
changes parameters

25 https://nifi.apache.org/

D4.2 Page 50 of 61
Data acquisition framework

5. CONCLUSIONS
The document presented the main building blocks and functionalities offered by the
QROWD Data Acquisition Framework. The objective of the document was to present
the work done in the scope of WP4 for data acquisition, including software and
methodological support to enable data ingestion and transformation to fulfil the
requirements of the pilots of the project.

The users of the framework are mainly QROWD developers, but also any developer
of data-enabled applications who would like to make ingest data into CKAN or the
Orion Context Broker. Therefore the framework could be used to enable the
acquisition of data in the scope of QROWD, but also in an isolated fashion enable
data acquisition in the abovementioned repositories.

The main results presented in the document enable the acquisition of both static and
dynamic datasets. The document describes a set data flows created by combination
of Apache NiFi templates generated in the scope of QROWD. The templates can be
used and combined by developers in the NiFi GUI to define actual dataflows to allow
complex data acquisition pipelines to ingest static data into CKAN or dynamic data
into the FIWARE Orion Context Broker with none or minimal programming effort.

The document provides hints and describes best practices to facilitate the creation of
the data flows and perform simple data transformations in NiFi. The document shows
how this is done to generate datasets in multiple formats and flavours and upload
them to CKAN in the case of static data, or to the Context Broker in the case of
dynamic data. A specific dataset versioning mechanism in CKAN has been
implemented to enable the functionality needed in QROWD.

The document is accompanied with the deployment of the software of the Data
Acquisition Framework in the QROWD server located at the InfAI premises in
Leipzig. Most of the code will be made available under an open source license.

D4.2 Page 51 of 61
Data acquisition framework

6. REFERENCES
Apache NiFi. Website, Available at https://nifi.apache.org/. Accessed October 31,
2018

CKAN. Website, Available at https://ckan.org Accessed October 31, 2018

FIWARE Context Broker. Website, Available at


https://fiware-orion.readthedocs.io/en/master/index.html. Accessed October 31, 2018

FIWARE Data Models. Website, Available at https://www.fiware.org/developers/data-


models/ Accessed October 31, 2018

FIWARE NGSI-9 Open RESTful API Specification. Website, Available at


https://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/FI-WARE_NGSI-
9_Open_RESTful_API_Specification. Accessed October 31, 2018

FIWARE NGSI-9 NGSI-9/NGSI-10 information model. Website, Available at


https://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/NGSI-9/NGSI-
10_information_model. Accessed October 31, 2018

H. Butler, M. Daly, A. Doyle, et al. 2016. “The GeoJSON Format”, August 2016

OASC. Website, Available at http://oascities.org/. Accessed October 31, 2018

OpenStreetMap. Website, Available at https://www.openstreetmap.org/. Accessed


October 31, 2018

World Wide Web Consortium, “RDF 1.1 Concepts and Abstract Syntax”, Feb 2014

D4.2 Page 52 of 61
Data acquisition framework

7. ANNEX 1: DEPLOYED DATAFLOWS


Following table shows the data flows deployed in QROWD NiFI server and related
information

Data flow Dataset Template Description File Original URL CKAN URL

1016 Cycle 1016 CSDA-2, Cycle paths of piste- http:// http://


paths in CSDA-4, the Trentino ciclabili- www.territorio.p ckan.qrowd.aks
Trentino CSDA-3 region provinciali- rovincia.tn.it/ w.org/dataset/
mt. geodati/ piste-ciclabili-
Acquisition 1457_Piste_cicl provinciali-mt
from abili_12_12_20
Trento 11.zip
Municipalit
y,
GeoJSON
transforma
tion,
Fiware

1019 Cycle 1019 CSDA-2, Cycle paths of piste- http:// http://


paths in Trento CSDA-4, the Trento city ciclabili- webapps.comu ckan.qrowd.aks
CSDA-3 comunali- ne.trento.it/ w.org/dataset/
mt cartografia/gis/ piste-ciclabili-
Acquisition dbexport? comunali-mt
from db=base&sc=m
Trento obilita&ly=piste
Municipalit _ciclabili&fr=gm
y, l
GeoJSON
transforma
tion,
Fiware

1022 zone- 1022 CSDA-2, Restricted zone- http:// http://


parcheggio CSDA-4, area of parcheggi webapps.comu ckan.qrowd.aks
CSDA-3 parking in o-mt ne.trento.it/ w.org/dataset/
Trento Acquisition cartografia/gis/ zone-
from dbexport? parcheggio-mt
Trento db=base&sc=m
Municipalit obilita&ly=zone
y, _parcheggio&fr
GeoJSON =gml
transforma
tion,
Fiware

D4.2 Page 53 of 61
Data acquisition framework

1023 1023 CSDA-2, Traffic limited zone- http:// http://


zone_traffico_li CSDA-4, area of traffico- webapps.comu ckan.qrowd.aks
mitato CSDA-3 parking in limitato-mt ne.trento.it/ w.org/dataset/
Trento Acquisition cartografia/gis/ zone-traffico-
from dbexport? limitato-mt
Trento db=base&sc=m
Municipalit obilita&ly=zone
y, _traffico_limitat
GeoJSON o&fr=gml
transforma
tion,
Fiware.

1024 1024 CSDA-2, Parking for parcheggi- http:// http://


parcheggi_disa CSDA-4, people with disabili-mt webapps.comu ckan.qrowd.aks
bili CSDA-3 disabilities Acquisition ne.trento.it/ w.org/dataset/
from cartografia/gis/ parcheggi-
Trento dbexport? disabili-mt
Municipalit db=base&sc=m
y, obilita&ly=parc
GeoJSON heggi_disabili&f
transforma r=gml
tion,
Fiware.

1026 1026 CSDA-2, Trento Sport impianti- http:// http://


impianti_sportivi CSDA-4, facilities sportivi-mt webapps.comu ckan.qrowd.aks
CSDA-3 Acquisition ne.trento.it/ w.org/dataset/
from cartografia/gis/ impianti-
Trento dbexport? sportivi-mt
Municipalit db=base&sc=is
y, truzione_sport&
GeoJSON ly=impianti_spo
transforma rtivi&fr=gml
tion,
Fiware.

1027 1027 CSDA-2, Bike sharing centro-in- http:// http://


centro_in_bici CSDA-4, stations bici-mt webapps.comu ckan.qrowd.aks
CSDA-3 Acquisition ne.trento.it/ w.org/dataset/
from cartografia/gis/ centro-in-bici-mt
Trento dbexport?
Municipalit db=base&sc=m
y, obilita&ly=centr
GeoJSON o_in_bici&fr=g
transforma ml
tion,
Fiware.

1028 nidi 1028 CSDA-2, Nursery nidi-mt http:// http://

D4.2 Page 54 of 61
Data acquisition framework

CSDA-4, Schools Acquisition webapps.comu ckan.qrowd.aks


CSDA-3 from ne.trento.it/ w.org/dataset/
Trento cartografia/gis/ nidi-mt
Municipalit dbexport?
y, db=base&sc=is
GeoJSON truzione_sport&
transforma ly=nidi&fr=gml";
tion, "http://
Fiware. webapps.comu
ne.trento.it/
cartografia/gis/
dbexport?
db=base&sc=is
truzione_sport&
ly=nidi&fr=gml

1029 1029 CSDA-2, Kindergartens scuole- http:// http://


Scuole_Infanzia CSDA-4, infanzia-mt webapps.comu ckan.qrowd.aks
CSDA-3 Acquisition ne.trento.it/ w.org/dataset/
from cartografia/gis/ scuole-infanzia-
Trento dbexport? mt
Municipalit db=base&sc=is
y, truzione_sport&
GeoJSON ly=materne&fr=
transforma gml
tion,
Fiware.

1030 elementari
1030 CSDA-2, Elementary elementari http:// http://
CSDA-4, schools -mt webapps.comu ckan.qrowd.aks
CSDA-3 Acquisition ne.trento.it/ w.org/dataset/
from cartografia/gis/ elementari-mt
Trento dbexport?
Municipalit db=base&sc=is
y, truzione_sport&
GeoJSON ly=elementari&f
transforma r=gml
tion,
Fiware.

1031 medie 1031 CSDA-2, Secondary medie-mt http:// http://


CSDA-4, schools Acquisition webapps.comu ckan.qrowd.aks
CSDA-3 from ne.trento.it/ w.org/dataset/
Trento cartografia/gis/ medie-mt
Municipalit dbexport?
y, db=base&sc=is
GeoJSON truzione_sport&
transforma ly=medie&fr=g

D4.2 Page 55 of 61
Data acquisition framework

tion, ml
Fiware.

1032 biblioteche 1032 CSDA-2, Municipal biblioteche http:// http://


CSDA-4, libraries -mt webapps.comu ckan.qrowd.aks
CSDA-3 Acquisition ne.trento.it/ w.org/dataset/
from cartografia/gis/ biblioteche-mt
Trento dbexport?
Municipalit db=base&sc=is
y, truzione_sport&
GeoJSON ly=biblioteche&f
transforma r=gml
tion,
Fiware.

1033 1033 CSDA-2, Districs of circoscrizi http:// http://


circoscrizioni CSDA-4, Trento oni-mt webapps.comu ckan.qrowd.aks
CSDA-3 Acquisition ne.trento.it/ w.org/dataset/
from cartografia/gis/ circoscrizioni-mt
Trento dbexport?
Municipalit db=base&sc=c
y, onfini&ly=circos
GeoJSON crizioni&fr=gml
transforma
tion,
Fiware.

1041 1041 CSDA-1 Taxi stations taxi- https:// http://


taxi_stations stations-mt os.smartcomm ckan.qrowd.aks
unitylab.it/ w.org/dataset/
Acquisition core.mobility/ taxi-stations-mt
from getTaxiStation/
Trento
Municipalit
y

1045 1045 CSDA-2, Post offices of uffici- http:// http://


post_offices CSDA-4, Trento postali-mt webapps.comu ckan.qrowd.aks
CSDA-3 Acquisition ne.trento.it/ w.org/dataset/
from cartografia/gis/ uffici-postali-mt
Trento dbexport?
Municipalit db=base&sc=uf
y, fici_postali&ly=
GeoJSON uffici_postali&fr
transforma =gml
tion,
Fiware.

1052 1052 CSDA-3 Parking parking- http:// http://

D4.2 Page 56 of 61
Data acquisition framework

parking_meter meters of meter-mt ckan.qrowd.aks ckan.qrowd.aks


Trento w.org/dataset/ w.org/dataset/
e0b8e0df-cf14- parking-meter-
466f-b424- mt
60b46487930f/
resource/
c8d92284-
b859-408d-
83cc-
8425f28fe8cf/
download/
parking-meter-
mt_geojson.ge
ojson

1058 bikeracks 1058 CSDA-2, Bike racks in bikeracks- http:// http://


CSDA-4, LTZ area mt webapps.comu ckan.qrowd.aks
CSDA-3 ne.trento.it/ w.org/dataset/
Acquisition cartografia/gis/ bikeracks-mt
from dbexport?
Trento db=base&sc=m
Municipalit obilita&ly=rastr
y, elliere&fr=shp
GeoJSON
transforma
tion,
Fiware.

1060 superiori 1060 CSDA-2, High schools superiori- http:// http://


CSDA-4, mt webapps.comu ckan.qrowd.aks
CSDA-3 ne.trento.it/ w.org/dataset/
Acquisition cartografia/gis/ superiori-mt
from dbexport?
Trento db=base&sc=is
Municipalit truzione_sport&
y, ly=superiori&fr=
GeoJSON shp";"http://
transforma webapps.comu
tion, ne.trento.it/
Fiware. cartografia/gis/
dbexport?
db=base&sc=is
truzione_sport&
ly=superiori&fr=
shp

1061 1061 CSDA-2, Carsharing car- http:// http://

D4.2 Page 57 of 61
Data acquisition framework

car_sharing CSDA-4, Trentino sharing-mt webapps.comu ckan.qrowd.aks


CSDA-3 parking slots ne.trento.it/ w.org/dataset/
Acquisition cartografia/gis/ car-sharing-mt
from dbexport?
Trento db=base&sc=m
Municipalit obilita&ly=car_s
y, haring&fr=shp
GeoJSON
transforma
tion,
Fiware.

1063 1063 CSDA-2, Bike shelters bike- http:// http://


bike_shelters CSDA-4, shelters- webapps.comu ckan.qrowd.aks
CSDA-3 mt ne.trento.it/ w.org/dataset/
Acquisition cartografia/gis/ bike-shelters-mt
from dbexport?
Trento db=base&sc=m
Municipalit obilita&ly=parc
y, heggio_protetto
GeoJSON _bike&fr=shp
transforma
tion,
Fiware.

Pollution Pollution CSDA-1 Air quality airquality- https:// http://


mt appa.alpz.it/ ckan.qrowd.aks
aria/opendata/ w.org/dataset/
json/last/2 airquality-mt

New datasets added in v0.2 of the deliverable

1053 1053 CSDA-3 Parking street parking- http:// http://


parking_street_ segments of street- ckan.qrowd.aks ckan.qrowd.aks
segments Trento segments- w.org/dataset/ w.org/dataset/
mt 2617963d- parking-street-
168e-49f4- segments-mt
9212-
899a4cf12022/
resource/
88b0d249-
c898-4657-
9609-
c3853456e766/
download/
parklatlon.geojs
on

D4.2 Page 58 of 61
Data acquisition framework

1056 1056 CSDA-3 Neighborhood neighborh http:// http://


Neighborhoods s valley oods- ckan.qrowd.aks ckan.qrowd.aks
_valley_Trento Trento Trento-mt w.org/dataset/ w.org/dataset/
a2c95287-fcf8- neighborhoods-
47cf-8797- trento-mt
ada82580f8b2/
resource/
dc4ca58a-
303d-45a7-
9c63-
ab1842b0dda0/
download/
neighborhoods
utmlatlon.geojs
on

1059 1059 CSDA-3 Parking RTZ parking- http:// http://


Parking_RTZ_a area slots-rtz- ckan.qrowd.aks ckan.qrowd.aks
rea area-mt w.org/dataset/ w.org/dataset/
c6dccbde- parking-slots-
a992-4dc2- rtz-area-mt
bf4c-
61cc88a0a396/
resource/
dfb61773-
0306-46d8-
853c-
787d9c67bb1a/
download/
stallilatlon.geoj
son

1062 1062 CSDA-3 Charging e-car- http:// http://


charging_statio stations charging- ckan.qrowd.aks ckan.qrowd.aks
ns stations-mt w.org/dataset/ w.org/dataset/
db637973- e-car-charging-
e57f-4874- stations-mt
b670-
0396c1a5a28f/
resource/
343f327d-
4db6-4afb-
bc0e-
53c562352024/
download/
carchargestatio
n.geojson

D4.2 Page 59 of 61
Data acquisition framework

4001_Bike_Rac 4001 CSDA-1 Bike racks bike-racks- http://waisvm- http://


ks_VCE vce vce rcg1v07.ecs.sot ckan.qrowd.aks
on.ac.uk/vce/ w.org/dataset/
4001_Bike_Ra bike-racks-vce
cks_VCE_Geo
JSON.json

4002_Bike_Rac 4002 CSDA-1 Bike racks bike-racks- http://waisvm- http://


ks_OSM osm osm rcg1v07.ecs.sot ckan.qrowd.aks
on.ac.uk/vce/ w.org/dataset/
4002_Bike_Ra bike-racks-osm
cks_OSM_Geo
JSON.json

TrentoNord A22-Nord CSDA-3 Trento trentonord- https://api- http://


Sensor Sensors A22 mt test.smartcom ckan.qrowd.aks
Nord munitylab.it/ w.org/dataset/
trento.mobilitys trentonord-mt
ensordata/
1.0.0/traffic/
RDT/ByHour

TrentoSud A22 -Sud CSDA-3 Trento trentosud- https://api- http://


Sensor Sensors A22 mt test.smartcom ckan.qrowd.aks
Sud munitylab.it/ w.org/dataset/
trento.mobilitys trentosud-mt
ensordata/
1.0.0/traffic/
RDT/ByHour

TrentoSS47 SS47 CSDA-3 Trento trentoss47 https://api- http://


Sensor Sensors -mt test.smartcom ckan.qrowd.aks
SS47 munitylab.it/ w.org/dataset/
trento.mobilitys trentoss47-mt
ensordata/
1.0.0/traffic/
RDT/ByHour

TrentoPiedi Piedicast CSDA-3 Trento trentopiedi https://api- http://


Sensor Sensors -mt test.smartcom ckan.qrowd.aks
ello
Piedicastello munitylab.it/ w.org/dataset/
trento.mobilitys trentopiedi-mt
ensordata/
1.0.0/traffic/
RDT/ByHour

TrentoBikeSud Bike Sud CSDA-3 Trento trentobikes https://api- http://

D4.2 Page 60 of 61
Data acquisition framework

Sensor Sensors Bike ud-mt test.smartcom ckan.qrowd.aks


Sud munitylab.it/ w.org/dataset/
trento.mobilitys trentobikesud-
ensordata/ mt
1.0.0/traffic/
Narx/ByHour

TrentoBikeNord Bike CSDA-3 Trento trentobike https://api- http://


Sensor Sensors Bike nord-mt test.smartcom ckan.qrowd.aks
Nord
Nord munitylab.it/ w.org/dataset/
trento.mobilitys trentobikenord-
ensordata/ mt
1.0.0/traffic/
Narx/ByHour

D4.2 Page 61 of 61

You might also like