Managing Knowledge in Energy Data Spaces
arXiv:2107.01965v1 [cs.DB] 5 Jul 2021
Valentina Janev1,2 , Maria-Esther Vidal3,4 , Kemele Endris3,4 , and
Dea Pujić1,2
1
The Mihajlo Pupin Institute, Serbia
2
University of Belgrade, Serbia
3
L3S Research Center, Leibniz University of Hannover, Germany
4
TIB Leibniz Information Centre for Science and Technology,
Hannover, Germany
valentina.janev@institutepupin.com, maria.vidal@tib.eu
Abstract Data in the energy domain grows at unprecedented rates and is
usually generated by heterogeneous energy systems. Despite the great potential
that big data-driven technologies can bring to the energy sector, general adoption is still lagging. Several challenges related to controlled data exchange and
data integration are still not wholly achieved. As a result, fragmented applications are developed against energy data silos, and data exchange is limited
to few applications. In this paper, we analyze the challenges and requirements
related to energy-related data applications. We also evaluate the use of Energy
Data Ecosystems (EDEs) as data-driven infrastructures to overcome the current limitations of fragmented energy applications. EDEs are inspired by the
International Data Space (IDS) initiative launched in Germany at the end of
2014 with an overall objective to take both the development and use of the IDS
reference architecture model to a European/global level. The reference architecture model consists of four architectures related to business, security, data
and service, and software aspects. This paper illustrates the applicability of
EDEs and IDS reference architecture in real-world scenarios from the energy
sector. The analyzed scenario is positioned in the context of the EU-funded
H2020 project PLATOON.
Keywords Data Integration Systems, Energy Big Data, Knowledge Graphs,
Data Exchange, Semantic Interoperability
1
Introduction
Big data is recognized as a relevant asset, and big data-driven applications are
increasingly devised in several domains and disciplines. In the energy sector,
1
big data has been presented as digital technology to understand how energy is
produced and consumed and how these patterns may impact in our lives and
economies [16]. Despite recognized as crucial applications for efficiently generating and consuming energy, Big data applications in the energy domain are
still underdeveloped and fragmented. Energy big data is collected from heterogeneous data sources which include wind farms, solar systems, conventional
power plants, cooling, heating, and lighting systems as well as smart grids. They
represent measurements in different domains, e.g., energy consumption, energy
generation, system outages, failures, weather, and energy transmission. Moreover, these data sources are characterized by the dominant Big Data dimensions,
i.e., volume, velocity, variety, veracity, and value. Furthermore, interoperability
and heterogeneity are usually caused by the various representations and interpretations of the data ingested from the data sources. These results put in
perspective data complexity issues that need to be tackled in the energy sector. This paper states the requirements to be fulfilled during data sharing and
integration to scale up large data sets and solve heterogeneity and quality issues.
Data interoperability is defined as the process of providing uniform access
to a set of distributed (or decentralized), autonomous, and heterogeneous data
sources [6]. Data integration systems (DIS) integrates data sets; they provide a
global schema (also known as mediated schema) to provide a unified view of all
data available in different integrated data sources. DISs can produce a materialized version of a data warehouse of the integrated data sources. The unified
schema serves not only to explicitly define the underlying data elements (thus
achieving syntactic interoperability), but also to assign unambiguous, shared
meaning to processed data, i.e., to reach semantic interoperability. Semantic
interoperability has been a key consideration in information systems design in
the last two decades and its importance has been widely recognised [5].
Several initiatives have been proposed to develop effective and scalable semantic interoperability towards data spaces. In the context of the European
Data Strategy [9] and the proposed Regulation on European Data Governance [10],
a vision has been created for trusted data intermediaries for B2B data sharing
and common European data spaces in crucial sectors such as health, environment, energy, agriculture, mobility, finance, manufacturing, public administration and skills. The International Data Space (IDS) is another initiative to
enable controlled data exchange and integration [2]. IDS proposes various standards, technologies, and governance models to facilitate secure and standardized data exchange and integration. Moreover, IDS provides building blocks
for the development of data-driven services, rs is guaranteed. Lastly, Data
Ecosystems [4, 19] are infrastructures that allow for data exchange across different stakeholders; they are equipped with data integration techniques and data
management methods to preserve data privacy and security. DEs facilitate the
creation of data markets for ensuring competitiveness and data sovereignty.
This paper analyzes the requirements for energy data exchange and illustrate
with a real-world use case, interoperability issues that may exist across energy
data sources. Furthermore, DEs for energy data managements are presented as
referential architectures to addressed challenges of the energy sector.
2
Figure 1: Electricity Market - Information flows
The paper begins with preliminaries related to recent data sharing initiatives
in Europe (section 2). This is followed by a motivation scenario from the energy
sector in section 3. Our approach for Big Data Management and Analytics in
energy domain is introduced in section 4. Section 5 presents proof-of-concept
results and the observed outcomes are discussed in section 6. Related work are
analyzed in section 7, and section 8 summarizes our conclusions and presents
an outlook to our future work.
2
Preliminaries
Data Ecosystems comprise various computational methods to overcome interoperability issues while preserving data privacy, security, and sovereignty. They
can be aligned to international data strategies, e.g., the European Data Strategy [9], representing, thus, crucial technological building blocks for digitalization and data markets, as well as for enhancing competitiveness and digital
sovereignty. A DE can be centralized, and maintain shared data sources and
host services on top of these sources. Moreover, whenever data privacy policies
regulate data exchange, a materialized integration of the data is impossible. In
this case, several DEs can be interconnected into a DE network [4]. As an
individual DE, each node maintains and exchanges data; it can also perform
data management and analytical tasks. DEs resort to semantic data models for
providing a uniform view of heterogeneous data sources. Moreover, mapping
rules relating to how data sources are defined in terms of the semantic data
models are included. Lastly, a DE can also be enhanced with a meta-layer that
describes business models, data access regulations, and data exchange contracts.
3
The International Data Spaces (IDS) [20] is an industrial initiative that
follows the DE concept. The IDS reference architecture IDS aims at i) data
governance according to regulations imposed by data providers; ii) ensuring
a trusted and secure data exchange; iii) semantically representing main data
concepts and relationships; iv) exchanging formats and protocols; and v) providing software design principles for guiding the implementation of the reference
architecture components. IDS provides building blocks for the development of
data-driven services, while data sovereignty for data providers is guaranteed.
IDS propose a message-based infrastructure to enable the communication of
the different nodes and components in a DE. Moreover, IDS resorts to the Semantic Web standards to express the content and meaning of the shared data
source. The Resource Description Framework (RDF) and ontologies defined
using RDF is proposed to specify meta-data, and data control and protection
in a decentralized or federated DE. The IDS shared information model states
standards for representing Content, Concept, Community of Trust, Commodity,
and Communication. Proposed W3C standards include SHACL1 are proposed
to express content and integrity constraints; SKOS2 for modeling concepts and
relationships; and PROV3 for representing data and service provenance.
3
Motivating Scenario
One of the long-term objectives of the EU is creation of common market that will
eliminate trade barriers between EU Member States. Studies [21] have shown
that the policy was partially effective at the EU level (being more especially in
high-income countries) taking into account the dynamization of the economy
and the achieved environmental sustainability. The penetration of variable renewable energy sources in the electricity sector increased significantly over the
last decade and that allowed the renewable energy suppliers to boost their production and consumption. However, there is still a lack of progress in some
countries and overall the EU energy market remains rather fragmented into
sub-markets with limited cross-border trade and competition.
3.1
Electricity Balancing and Commercial Flows on Country Level
Figure 1 gives a simplified illustration of electricity flows (blue arrows) and
commercial flows (red arrows) between market participants, while in reality, the
electricity infrastructure and data exchange processes are very complex, i.e., infrastructure consists of many energy systems (generation, transmission, demand
infrastructures). The data sources are related to wind power systems, solar
power systems, conventional power plants, cooling, heating, and lighting systems as well as smart grids. They represent measurements in different domains,
1 https://www.w3.org/TR/shacl/
2 https://www.w3.org/2004/02/skos/
3 https://www.w3.org/TR/prov-overview/
4
Energy Data Services
Federated Query
Processing (FQP)
Data Source
Descriptions
Regulations
Regulations
Regulations
FQP
Data Source
Descriptions
Domain
Ontologies
Properties
Data Source
Descriptions
Properties
Mappings
Domain
Ontologies
Metadata
Mappings
Properties
FQP
FQP
Metadata
Mappings
Metadata
Data Source
Descriptions
Domain Ontologies (Energy,
weather, sensor, etc data models)
Access Contracts
(IDS Connectors)
Properties
Regulations (EU, ...)
Meta-Data
Domain
Ontologies
Metadata
Data Sources
Data
Operators
Knowledge Base
Stnd.
Data Sources
Reg.
Links
Mappings and Links
External Knowledge Graphs
LOD-cloud.net
Figure 2: The Energy Big Data integration platform as a Data Ecosystem
Services
Services
Strategy/Business
Strategy/Business
Data exchange
Trust
Domain Ontologies
dependencies
Properties
Pilot N
Data
Meta-Data
Domain Ontologies
Objectives
Regulations
Descriptions
Properties
Data Operators (e.g.,
Preprocessing and Integration)
Data Sources
Mappings
Roles/
Stakeholders
3.2
Meta-Data
Data Sources
Pilot 1
Regulations
Descriptions
e.g., energy consumption, energy generation, system outages, failures, weather,
and energy transmission. These data sources are characterized by the dominant
Big Data dimensions, i.e., volume, velocity, variety, veracity, and value. Modernization of the grid implies fast integration of RESs, adapted power system
planning, new forecasting methods, more flexible use of power plants, standardized data exchange, increased transfer capacity, and others. Additionally, the
volatile production of renewable energy sources creates particular challenges for
the daily electricity balancing process (i.e., balancing the deviations between
Strategy/Business
the planned or forecast production and demand, on the one side,
and the actual
performance in real-time, on the other side [24]). While the RES installations
can be built relatively quickly, the integration occur when the independent producers ensure compliance with grid code requirements [3], as well as, when the
basic grid support services are in place.
Therefore, the integration of distributed variable generation from (independent) producers in the grid is an important subject and therefore it should be
adequately addressed.
Mappings
Agreements
Data Exchange Requirements
The ability of two or more networks, systems, applications, components, or devices from the same or different vendors to exchange and subsequently use that
information to perform required functions is called interoperability. Syntactic
interoperability is the capability of two or more systems of communicating and
exchanging data, e.g., specified data formats (e.g., XML), communication protocols (TCP/IP), and the like are fundamental tools of syntactic interoperability.
Semantic interoperability [18] is the ability of systems to exchange information
with unambiguous meaning.
5
Knowledge Graph
Links
Mappings and Links
Data Operators (e.g.
Preprocessing and Integration)
Domain
Ontologies
Node N
Node 2
Links
Knowledge Base
Properties
Mappings
Knowledge Base
Node 1
Mappings
Data Sources
Data
Operators
Mappings
Mappings
Data Sources
Data
Operators
Data Source
Descriptions
Regardless of the type of infrastructure (e.g., wind plant, photovoltaic power
plant), it is necessary to enable common understanding and messages exchange
between different energy stakeholders including raw data (base reading, measurements), processed information (e.g., forecasts, alerts), and market information. The data-driven frameworks previously motivated demand the satisfaction
of the following requirements whenever data is exchanged.
Transmission System Operator (TSO):
• RQ-1. For the cross-border electric energy balancing, TSO needs to exchange balancing plans.
• RQ-2. In order to plan capacities and electricity cross-border exchanges,
TSO receive bids from Balancing Service Provider (BSP) about corresponding volume of balancing energy for the duration of a contract.
• RQ-3. TSO collects load information at different points for the grid it
operates.
• RQ-4. Based on the balancing contracts, collected bidding information,
and collected load, TSO assess the balancing needs and sends plans to
BRP for producing electricity.
Balance Service Providers (BSP) and Balance Responsible Party (BRP):
• RQ-5. BSP receives bids from producers (BRP) about the expected realization (short, medium and long term) in order to produce a more accurate
day-ahead forecast.
• RQ-6. BSP collects different meteorological data in order to plan the
energy mix (activation and deactivation of conventional producers).
• RQ-7. BRP and BSP prepares (short, medium and long term) forecasts,
publish the information in transparent way and sends to TSO.
• RQ-8. BSP and BSP collect infrastructure health information and sends
monitoring information to TSO.
3.3
Interoperable Analytical Services Requirements
This subsection presents example scenarios where data-driven methods are required i.e. analytical services to forecast energy consumption and predict maintenance.
Transmission System Operator (TSO):
Load/Demand forecasting: Electricity demand forecasting is a central
and integral process for planning periodical operations and facility expansion in
the electricity sector and involves accurate prediction of both magnitudes and
geographical locations of electric load over the different periods of the planning
horizon. There are several factors that will be taken into consideration for load
6
forecasting, which can be classified as time factor, economic factor, weather
condition and customer factor.
Balance Responsible Party:
RES forecasting: RES (wind power) forecasting yields estimate the variable power injected in the distribution grid. This allows prediction when the
transformer connecting the distribution grid to the transmission grid will be
overloaded, i.e., when local wind turbine generator production will be very high.
One of the key challenges for day-ahead forecasting of wind energy remains unscheduled outages that can have large effects on the forecasts for small systems,
while the effect is small on the overall grid.
Predictive maintenance: The continuous monitoring of asset performance
generates input that can be used for predictive analytics and to provide early
warnings of component/object failures (e.g., RES plant/component). Identifying problems before they occur helps to reduce unscheduled downtime, improve
plant maintenance and optimize asset performance.
4
Data Ecosystems for Energy Big Data Management and Analytics
Herein, we propose an Energy Big Data integration platform as an instantiation
of a Data Ecosystem (DE) [4], see Figure 2.
4.1
Energy Big Data Integration Platform
Figure 3: Multi-party data exchange based on IDS concept
An Energy Big Data integration platform is composed of several data integration platforms (one per Node i). Each node corresponds to a DE and can
be integrated on the central level through mappings among nodes, data sharing, and service agreements. Each node (in Figure 2 denoted by Node) applies
a data integration process on a specific use case and can deploy its services
for query processing, analytics as well as dashboards. Communication between
7
nodes needs to be through an access agreement and can employ data connectors
(IDS connectors) to secure data exchange according to data access contracts and
regulations. Nodes have control over their data and may have data integrated
in unified knowledge graphs. Moreover, each individual knowledge graph can be
linked to knowledge graphs in other nodes, or to external knowledge graphs like
DBpedia [1], Wikidata [25], or others in the Linked Open Data cloud4 . Metadata is expressed using common semantic data models (CIM, DCAT, SKOS),
and diverse mapping rule languages (e.g., RML or SPARQL) that are utilized
in order to define (present) each pilot data sets in terms of the semantic data
models. This platform represents a decentralized infrastructure empowered with
the components that pave the way for interoperability across stakeholders.
4.2
Instantiating a DE
The main features of the energy data integration platform are illustrated in the
instantiation of a DE; for instance, in the Serbian pilot depicted in Figure 3,
DEs shall be instantiated at
• Producers site (e.g., at a wind power plant, a unified knowledge graph shall
be integrated with the production forecast and the predictive maintenance
services);
• Supplier site, an organization that integrate data from many producers
and sell electricity to TSO (e.g., the Power Industry of Serbia might be
interested to integrate the data sources from power plants it owns and
manages);
• Transmission System Operator site, an organization that operates and
balances the grid (e.g., the Joint Stock Company EMS might be interested
in improving the data integration and the transparency of data exchanged
with other actors).
For instance at the DE of Transmission System Operator four main data
sources are currently available for integration as follows i) the Joint Stock Company (JSC) EMS Transparency platform5 ; ii) ENTSO-E Transparency platform6 ; iii) Meteorological data from WeatherBit7 ; and iv) data from SCADA
system (archive data for RES production and aggregated load)8 .
Data operators for preprocessing, mapping, linking, transformation, and validation are applied to the pilot data sources for creating a materialized version of
the unified knowledge graph. The mappings between data sources and the target
ontology are part of the DE as well. Furthermore, mappings between concepts
from different ontologies are part of each DE. Data sources are also described in
terms of provenance and main properties; these descriptions are utilized for the
4 https://lod-cloud.net/
5 https://transparency.ems.rs/
6 https://transparency.entsoe.eu
7 https://www.weatherbit.io
8 http://www.pupin.rs/en/products-services/process-management/scada/
8
creation of a knowledge graph (e.g., by using SDM-RDFizer [14]) and during
query processing (e.g., by using Ontario [8]). Links between entities in knowledge graph and external data sources can be done by performing entity linking.
Tools like Falcon2.0 [23] can be applied to linking the pilots’ datasets with external knowledge graphs like DBpedia and Wikidata, while SHACL validation
engines (e.g., Trav-SHACL [11]) enable the validation of integrity constraints.
Lastly, RDF knowledge graph will feed the Semantic based analytics engine
SANSA [17] to perform tasks of knowledge discovery and prediction.
5
Application: A Use Case
Although Serbia is not an EU country, the Energy Sector Development Strategy
is based on the EU Energy Roadmap; one of the goals is to increase the RES
share. Also, the information about the quantity of energy produced from RES
has to be presented to the end user (guarantee of origin) with a document
issued by the Distribution System Operator. Hence, the information from the
producers, via suppliers and TSO, shall be available to end user (from Serbia
and/or abroad) (Figure 3).
5.1
Developing a Global Schema for the Energy Domain
For development of a global schema different existing data models have been
consulted and considered for reuse such as
• the IEC Common Information Model standards (CIM)9 ;
• the Smart Appliances REFerence ontology (SAREF);
• the IDS10 Information Model;
• the SEAS - Smart Energy Aware Systems11 .
The selection has been done based on a set of scenarios (electricity balancing services, predictive maintenance services, services for residential, commercial and industrial sector). In our analysis, we have used the semantic CIM
model12 . It is a canonical taxonomy in the form of packages of UML class diagrams referring to the components of power utility networks with functional
definitions and measurement types to a high degree of granularity (packages:
Core, Topology, Wires, Generation, LoadModel, Outage, SCADA, ControlArea
and others). The concepts selected for reused come from different packages. For
instance, cim:PowerSystemResource (Core package) can be an item of equipment such as a Switch, a cim:EquipmentContainer containing many individual
9 https://www.dmtf.org/standards/cim/cim_schema_v2530)
10 https://international-data-spaces-association.github.io/InformationModel/
docs/index.html
11 https://w3id.org/seas/
12 https://ontology.tno.nl/IEC_CIM/
9
Figure 4: Unified Knowledge Graph Creation Process
items of equipment such as a Substation. Each cim:PowerSystemResource is
registered on the grid (cim:RegisteredResource) and belong to a control area
(cim:HostControlArea) that is operated by a Contro Area Operator. The
cim:ControlAreaOperator is responsible for stabilizing the system frequency
(cim:Frequency); it is therefore also called frequency control.
Example - Load/Demand forecasting: The system is balanced by utilizing both supply and demand resources. However, the existing electric power
systems were not initially designed to incorporate different kinds of generation
technology (cim:Plant) in the scale that is required today. Historically, balancing the system has been maintained mostly by directing thermal power plants to
increase or reduce output (cim:ActivePower) in line with changes in demand.
Example - RES forecasting: Electricity production however from solar
and wind plants (cim:Plant) is subject to considerable forecast errors that drive
demand for balancing, i.e., for ( cim:ReserveReq). The amount for each reservation is defined by the Agreement (cim:Agreement) on the provision of system
services signed between the transmission system operator and the balancing
service provider (cim:BalanceSupplier).
Once the global schema has been developed, it can be used across the nodes
established in the energy data ecosystem.
5.2
Unified Knowledge Graph Creation Process
In this section, two types of knowledge graph creation strategies are discussed:
materialized (i.e., data warehousing) and virtual (i.e., Data Lake). Both strategies are applicable for the use cases discussed above.
Materialized Knowledge Graph Creation Process: In a materialized
10
knowledge graph creation process, data from individual data sources are loaded
and materialized into an RDF format and stored in a physical database, the
so-called triplestore. Figure 4 shows the data curation and integration subcomponents for creating a unified knowledge graph. Input data from Producers’ data sources is first stored in a raw data repository, i.e., staging repository.
Any preprocessing steps, such as cleaning, normalization, and aggregation, that
are predefined for input data are applied and provenance is recorded. The
data integrator component then orchestrates the knowledge graph creation process according to the data source’s configuration by invoking the Linking and
Enrichment, RDFizer/Semantifier, and Data Validation sub-components and
finally integrating data to the Supplier’s unified knowledge graph. The Linking and Enrichment component performs entity linking and enrichment using
external as well as existing materialized knowledge graphs. The RDFizer/Semantifier component transforms non-semantic, i.e., raw, data to RDF graph
based on mapping rules. Data validation component checks data constraint
conformance.
Virtual Knowledge Graph Creation Process: In a virtual knowledge graph
creation process, data remains in the sources (in raw format) and is accessed
as needed during query time. The federated query processing component employs the data source descriptions stored in the metadata store to perform the
integration during query time. Metadata about the number of data sources
available, the provenance of the data sets, and mapping rules to transform data
to RDF graph are stored in a separate data store available for both materialized and virtual data integration processes. The federated query engine will use
SPARQL query language13 to access the unified knowledge graph, as described
in the next Section.
5.3
Traversing the Knowledge Graph
Once the knowledge graph creation process is established, exploring the knowledge base will be possible via a query engine. If the materialization approach
is applied and data is stored in a centralized triple store, e.g., Virtuoso, then
the knowledge base can be accessed using SPARQL query over the query engine embedded in the triple store. However, if the size (in terms of volume)
of the materialized knowledge base is big, then partitioning and distribution is
necessary for timely response from the query engine and handling the resource
requirements to store such large data in expensive servers.
5.4
Federated Query Processing
Federated query processing system provides a unified access interface to a set
of autonomous, distributed, and heterogeneous data sources. While distributed
query processing systems have control over each data set, federated query processing engines have no control over data sets in the federation, and data
13 https://www.w3.org/TR/rdf-sparql-query/
11
providers can join or leave the federation at any time and modify their data
sets independently. The role of federated query processing engines is to transform a query, i.e., the federated query, expressed in terms of the global schema
into an equivalent query expressed in the schema of the data sources, i.e., the
local query. The local query represents the federated query’s actual execution
plan by the federation’s data sources. An essential part of query processing in
the context of federated data sources is query optimization.
Example: Let us consider the following question expressed in SPARQL: “A
list of countries, their renewable energy plants, and respective installed generation capacity for the year 2020”
PREFIX wd:
<http://www.wikidata.org/entity/>
PREFIX wdt:
<http://www.wikidata.org/prop/direct/>
PREFIX energy: <http://w3id.org/energy/>
SELECT DISTINCT
WHERE {
?genCapacity
?genCapacity
?genCapacity
?genCapacity
?genCapacity
?productionType
}
?country ?productionType ?measure
a energy:GenerationCapacity .
energy:productionType ?productionType .
energy:country
?country .
energy:measure
?g_measure .
energy:agg_year
"2020" .
wdt:P279
wd:Q12705 .
To execute this query, a federation of knowledge graphs is needed, i.e., the
Energy DE knowledge graph and the external knowledge graphs like Wikidata
14
. The federated query engine maintains metadata about these knowledge
graphs, and it is able to select them as relevant sources for the query. Then,
once the knowledge graphs are selected, the federated query engine decomposes
the query into subqueries SQ1 and SQ2, and executes them over the selected
knowledge graphs, respectively. Query SQ1 is defined as follows; it is executed
against the local knowledge graph.
PREFIX energy: <http://w3id.org/energy/>
SELECT DISTINCT ?country ?productionType ?measure
WHERE {
?genCapacity
a energy:GenerationCapacity .
?genCapacity
energy:productionType ?productionType .
?genCapacity
energy:country
?country .
?genCapacity
energy:measure
?g_measure .
?genCapacity
energy:agg_year
"2020" .
}
On the other hand, query SQ2 is defined and evaluated over Wikidata.
14 https://www.wikidata.org/
12
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?productionType
WHERE {
?productionType
wdt:P279 wd:Q12705 .
}
The federated query engine performs join against the results produced by the
execution of queries SQ1 and SQ12; the values of the variable ?productionT ype
are utilized as a join column. As a result, only one renewable energy plant can
be matched, i.e., the wind power. It is important to highlight that without the
integration of the transparency platform data and the linking of the corresponding production types with Wikidata, this query could not be executed.
6
Discussion
The energy sector is an example where tremendous amounts of data are collected from numerous sensors, which are generally attached to different plant
subsystems. The new paradigm of DEs for smart grids that includes renewable
energy sources challenges the existing network infrastructure and the energy
management systems even more. DEs address the following challenges:
• definition of new approaches to data management and processing and
extending the service portfolio of various energy stakeholders in order to
achieve two-way flows of electricity;
• deploying distributed/edge processing and data analytics technologies to
optimize the operation of the real-time energy system management and
automate the “monitor-forecast-optimize-control” loop;
• implementation of effective integration of relevant digital technologies for
transforming the system from top-down centralized production and rigid
distribution framework to collaborative ecosystem of self-managed prosumers able to act independently on the liberalized energy markets.
In this paper, we showcased how a “networks” of distributed data integration platforms can be instantiated in the energy value chain for establishing a
”network of trusted data”. Some benefits for main actors are
• Secure data exchange: Using the IDS concept that features various levels of protection, data is exchanged securely across the entire data supply
chain (and not just in bilateral data exchange).
• Data governance and sovereignty: Data owner determines the terms
and conditions of use of the data provided, while data sovereignty always
remains with the respective Data Provider. The Provider makes data
available to be requested by certain contractors in the Data Space by its
own rules.
13
• Innovative scalable and replicable energy management services:
The Data Spaces opens opportunities for new data-driven and modeldriven services that will complement and enhance the existing e.g. balancing services, energy generation and consumption intelligent forecasts
services, energy performance assessment services, etc.
7
Related Work
Gelhaar and Otto [12] highlight the value of data-driven solutions in the digitization era and outline the challenges that need to be addressed in DEs in emerging
areas like maritime, manufacturing, and science. Controlled and secured data
exchange in a traceable way are among the most relevant challenges.
Several approaches have been defined to follow the DE architecture with
the aim of solving interoperability across heterogeneous data sets during query
processing time; they are usually named as federated query engines. Exemplary
approaches include GEMMS [22], PolyWeb [15], BigDAWG [7], Constance [13],
and Ontario [8]. These systems collect metadata about the main characteristics
of their data sets, e.g., formats and query capabilities. Additionally, they resort
to a global ontology to describe contextual information and the relationships
among data sets, for purposes of optimized data integration, query processing, and automated schema discovery in quasi-central settings. Metadata have
shown to be crucial for enabling these systems to perform query processing effectively. Knowledge-driven DEs are built on these results and make available
the semantic description of the data collections made available by stakeholders.
Furthermore, a DE empowers federated query processing engines with factual
statements about the integrity constraints satisfied by the data retrieved and
merged during query processing. As a consequence, a new paradigm shift in
data management is devised towards tracing down data integration during query
processing.
8
Conclusion and Future Work
One of the requirements related to data access procedures in Smart Grids and
future electricity markets is related to interoperability of energy services. Therefore, the overall goal of the paper is to showcase and evaluate Data Ecosystems
and the IDS concept for the energy sector. In our work, we showed how DEs
provide the building blocks for enhancing the interoperability of energy management applications/services; they also enable the integration of energy data
in the European Energy Data Space. The meta-data layer in DEs together
with the internal SCADA information model can be used as an information hub
(‘knowledge graphs’) for (1) building data connectors that will facilitate integration of services in future integrated energy systems and (2) improving the
explainability of machine learning services / analytical applications. The selection of models has been done based on a set of scenarios (electricity balancing
14
services, predictive maintenance services, services for residential, commercial
and industrial sector).
Acknowledgements
This work has been partially supported by the EU H2020 funded projects PLATOON (GA No. 872592), the EU project LAMBDA (GA No. 809965), and
partly by the Ministry of Science and Technological Development of the Republic of Serbia (No. 451-03-9/2021-14/200034) and the Science Fund of the
Republic of Serbia (Artemis, No.6527051).
References
[1]
Sören Auer et al. “DBpedia: A Nucleus for a Web of Open Data”. In: The Semantic Web,
6th International Semantic Web Conference, 2nd Asian Semantic Web Conference,
ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007. Ed. by Karl Aberer
et al. Vol. 4825. Lecture Notes in Computer Science. Springer, 2007, pp. 722–735. doi:
10.1007/978-3-540-76298-0\_52. url: https://doi.org/10.1007/978-3-540-762980%5C_52.
[2]
Sebastian R. Bader et al. “The International Data Spaces Information Model - An
Ontology for Sovereign Exchange of Digital Content”. In: The Semantic Web - ISWC
2020 - 19th International Semantic Web Conference. 2020.
[3]
R. Brundlinger. “Semantic Interoperability in Industry 4.0: Survey of Recent Developments and Outlook”. In: NEDO/IEA PVPS Task 14 Grid Code and RfG Workshop.
NEDO/IEA. 2019.
[4]
Cinzia Capiello et al. “Data Ecosystems: Sovereign Data Exchange among Organizations
(Dagstuhl Seminar 19391)”. In: Dagstuhl Reports. Vol. 9. 9. Schloss Dagstuhl-LeibnizZentrum fuer Informatik. 2020.
[5]
Jim Davies et al. “A formal, scalable approach to semantic interoperability”. In: Science
of Computer Programming 192 (2020), p. 102426. issn: 0167-6423. doi: https://doi.
org/10.1016/j.scico.2020.102426. url: https://www.sciencedirect.com/science/
article/pii/S016764232030037X.
[6]
AnHai Doan, Alon Halevy, and Zachary Ives. Principles of data integration. Elsevier,
2012.
[7]
Jennie Duggan et al. “The BigDAWG Polystore System”. In: SIGMOD Rec. 44.2 (Aug.
2015), pp. 11–16. issn: 0163-5808. doi: 10.1145/2814710.2814713. url: http://doi.
acm.org/10.1145/2814710.2814713.
[8]
Kemele M. Endris et al. “Ontario: Federated Query Processing Against a Semantic Data
Lake”. In: Database and Expert Systems Applications - 30th International Conference,
DEXA 2019, Linz, Austria, August 26-29, 2019, Proceedings, Part I. Ed. by Sven
Hartmann et al. Vol. 11706. Lecture Notes in Computer Science. Springer, 2019, pp. 379–
395. doi: 10.1007/978-3-030-27615-7\_29. url: https://doi.org/10.1007/978-3030-27615-7%5C_29.
[9]
European Commission. A European Strategy for Data (19 February 2020, COM(2020)
66 final). https://ec.europa.eu/info/strategy/priorities- 2019- 2024/europefit-digital-age/european-data-strategy_en. 2020. (Visited on 02/19/2020).
[10]
European Commission. Proposal for a Regulation of the European Parliament and the
Council on European Data Governance (Data Governance Act, 25 November 2020,
COM/2020/767 final). https://eur- lex.europa.eu/legal- content/EN/TXT/?uri=
CELEX%3A52020PC0767. 2020. (Visited on 11/25/2020).
15
[11]
Mónica Figuera, Philipp D. Rohde, and Maria-Esther Vidal. “Trav-SHACL: Efficiently
Validating Networks of SHACL Constraints”. In: The WebConf WWW. 2021.
[12]
Joshua Gelhaar and Boris Otto. “Challenges in the Emergence of Data Ecosystems”.
In: 24th Pacific Asia Conference on Information Systems, PACIS. Ed. by Doug Vogel
et al. 2020, p. 175.
[13]
Rihan Hai, Sandra Geisler, and Christoph Quix. “Constance: An Intelligent Data Lake
System”. In: Proc. of the 2016 International Conference on Management of Data,
SIGMOD, San Francisco, USA. ACM, 2016, pp. 2097–2100. doi: 10.1145/2882903.
2899389. url: https://doi.org/10.1145/2882903.2899389.
[14]
Enrique Iglesias et al. “SDM-RDFizer: An RML interpreter for the efficient creation of
RDF knowledge graphs”. In: Proceedings of the 29th ACM International Conference
on Information & Knowledge Management. 2020, pp. 3039–3046.
[15]
Yasar Khan et al. “One Size Does Not Fit All: Querying Web Polystores”. In: IEEE
Access 7 (Jan. 2019), pp. 9598–9617.
[16]
Erik Kobayashi-Solomon. Using Big Data And The Power Of Markets To Solve Climate
Change. https://www.forbes.com/sites/erikkobayashisolomon/2020/08/07/usingbig-data-and-the-power-of-markets-to-solve-climate-change/. 2020. (Visited on
08/07/2020).
[17]
Jens Lehmann et al. “Distributed Semantic Analytics Using the SANSA Stack”. In: The
Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna,
Austria, October 21-25, 2017, Proceedings, Part II. Ed. by Claudia d’Amato et al.
Vol. 10588. Lecture Notes in Computer Science. Springer, 2017, pp. 147–155. doi: 10.
1007/978- 3- 319- 68204- 4\_15. url: https://doi.org/10.1007/978- 3- 319- 682044%5C_15.
[18]
F. Nilsson J.and Sandin. “Semantic Interoperability in Industry 4.0: Survey of Recent
Developments and Outlook”. In: 2018 IEEE 16th International Conference on Industrial Informatics (INDIN). IEEE. 2018, pp. 127–132.
[19]
Marcelo Iury S Oliveira and Bernadette Farias Lóscio. “What is a Data Ecosystem?”
In: Proceedings of the 19th Annual International Conference on Digital Government
Research: Governance in the Data Age. 2018, pp. 1–9.
[20]
Boris Otto et al. Reference Architecture Model for the Industrial Data Space. https:
/ / www . fit . fraunhofer . de / content / dam / fit / en / documents / Industrial - Data Space_Reference-Architecture-Model-2017.pdf. 2017. (Visited on 02/12/2021).
[21]
Pablo Ponce. “The Liberalization of the Internal Energy Market in the European Union:
Evidence of Its Influence on Reducing Environmental Pollution”. In: Energies 13 (2020),
p. 17.
[22]
Christoph Quix, Rihan Hai, and Ivan Vatov. “GEMMS: A Generic and Extensible
Metadata Management System for Data Lakes”. In: 28th International Conference on
Advanced Information Systems Engineering (CAiSE 2016). CEUR-WS, 2016, pp. 129–
136.
[23]
Ahmad Sakor et al. “Falcon 2.0: An Entity and Relation Linking Tool over Wikidata”.
In: CIKM ’20: The 29th ACM International Conference on Information and Knowledge
Management, Virtual Event, Ireland, October 19-23, 2020. Ed. by Mathieu d’Aquin et
al. ACM, 2020, pp. 3141–3148. doi: 10.1145/3340531.3412777. url: https://doi.org/
10.1145/3340531.3412777.
[24]
Reinier A.C. van der Veen and Rudi A. Hakvoort. “The electricity balancing market:
Exploring the design challenge”. In: Utilities Policy 43.B (2016), pp. 186–194.
[25]
Denny Vrandecic and Markus Krötzsch. “Wikidata: a free collaborative knowledgebase”.
In: Commun. ACM 57.10 (2014), pp. 78–85. doi: 10.1145/2629489. url: https://doi.
org/10.1145/2629489.
16