Introduction to Linked Data
Laura Po - Exploration, Visualization and Querying of Linked Open Data sources
2nd Keystone Training School - Keyword Search in Big Linked Data, University of Santiago de Compostela (USC), Spain.
Laura Po


By the end of this module you should have an understanding of
• What is linked data
• What is open data
• What is the difference between linked and open data
• How to publish linked data (5-star schema)
• What are the linked data principles and the linked data technologies
(the semantic web stack)
• The economic and social impact of linked data


The Web of Data
The evolution from a Web of linked documents to a web of linked data
The Web as a huge decentralized database (knowledge base) of machine-
accessible data
Web of documents... Web of linked data...


The evolution of the web
• The Web started as a collection of documents
published online – accessible at Web location
identified by a URL.
• These documents often contain data about real-
world resources which is mainly human-readable
and cannot be understood by machines.
• The Web of Data is about enabling the access to
this data, by making it available in machine-
readable formats and connecting it using Uniform
Resource Identifiers (URIs), thus enabling people
and machines to collect the data, and put it
together to do all kinds of things with it (permitted
by the licence).
Machine-readable data (or
metadata) is data in a format that
can be interpreted by a computer.
2 types of machine-readable
• human-readable data that is
marked upso that it can also
be understood by computers,
e.g. microformats, RDFa;
• data formats intended
principally for computers, e.g.


Linked Data and the ‘Web of Data‘
● Term refers to an idea originally from Tim Berners-Lee
(Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html)
● Set of best practices for publication and linking of
structured data on the web
● Basic assumption: The value of data on the web increases
when they are connected to other data sources
M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare.
The Semantic Web isn't just
about putting data on the
web. It is about making
links, so that a person or
machine can explore the web
of data. With linked data,
when you have some of it,
you can find other, related,


Defining linked data
“Linked data is a set of design principles for sharing
machine-readable data on the Web for use by public
administrations, business and citizens.”
EC ISA Case Study: How Linked Data is transforming eGovernment


Linked Data Principles
1. Use URIs as names for things.
2. Use HTTP URIs, so that people can look up those names.
3. When someone looks up a URI, provide useful information,
using the standards (RDF, SPARQL)
4. Include links to other URIs, so that they can discover more


How to get Data from the Web?
● Data can only be found on the Web, if it is available at some website
Web Server


How to get Data from the Web?
● There is a number of different (proprietary) Web APIs, data exchange
formats and Mashups on top of that
Database 1 Database 2 Database 3 Database 4


In the Web today...
● Data is locked up in small data islands
● Other applications usually cannot access this data...


Semantic Web Technologies , Dr. Harald Sack, Hasshttp://www.w3.org/2009/Talks/0204-ted-tbl/#(22)


How to get rid of Closed Data Islands?
Database 1 Database 2 Database 3 Database 4
● Apply Semantic Web technologies
○ to publish (structured) data on the web
○ to draw connections from one data source to data from other data sources
RDF data RDF data RDF data RDF data


Linked Data Principles (1/4)
1. Use URIs as names for things.
○ URIs do not only identify documents but also arbitrary objects
of the real world as well as abstract concepts


Linked Data Principles (2/4)
2. Use HTTP URIs, so that people can look up those names.
○ HTTP URIs (URLs) as globally unique names enable
dereferencing of associated information in the Web
○ via http Content Negotiation machine and humans can
access the resource identified by the URI
URI represents Designatum
URI represents Designator URI represents Designator
Every term in a LOD source
must be accessible via its URI
through an HTTP GET. Once
we access the URI we found
the definition of the term.


Linked Data Principles (3/4)
3. When someone looks up a URI, provide useful information, using the
standards (RDF, SPARQL)
○ RDF as universal data model for publishing structured data on the Web
○ Make all URIs in the RDF graph dereferenceable
○ Avoid RDF constructs that cause problems in Linked Data context
■ RDF Reification
■ RDF Collections und Containers
■ unnamed Blank Nodes


Linked Data Principles (4/4)
4. Include links to other URIs, so that they can discover more things.
○ Link RDF references among data between different data sources:
○ owl:sameAs –create a link between individuals
○ rdfs:seeAlso – states that a resource may provide additional information
○ Relationship Links
Links to external LOD Entitites related with the original entity
○ Identity Links
Links to external LOD Entities referring to the same object or concept
○ Vocabulary Links
Links to definitions of the original entity


Advantages of Linked Open Data vs. APIs
○ Simple and generic API for various heterogeneous data sources
enables simple reuse and data sharing among applications
○ RDF Data model guarantees (simple) extensibility
○ Transport via http, standard Port 80, prevents firewall adaption
○ Ontologies enable meaningful connections between data sources
○ Reasoning over Linked Data enables to generate new knowledge,
i.e. inference from implicit to explicit knowledge


Introduction to linked data


The Semantic Web Technology Stack
Santiago de Compostela
URI - Uniform Resource Identifier


From Wikipedia to DBpedia


From Wikipedia to DBpedia


RDF Resource Description Framework
:Santiago_de_Compostela rdf:type dbo:City .
:Santiago_de_Compostela dbo:country dbr:Spain .
:Santiago_de_Compostela owl:sameAs
geodata:Santiago di Compostela .
dbp:city dbr:Santiago_de_Compostela .
:Santiago_de_Compostela dbp:populationTotal
95671 (xsd:integer) .
:Santiago rdf:type dbo:City .
RDF Subject RDF Property RDF Object
RDF Triple
From Wikipedia to DBpedia


● Resource
○ can be everything
○ must be uniquely identified and referencable via URI
● Description
○ = description of resources
○ via representing properties and relationships among resources as graphs
● Framework
○ = combination of web based protocolls (URI, HTTP, XML, Turtle, JSON, …)
○ based on formal model (semantics)
● Knowledge in RDF is expressed as a list of statements
● all RDF statements follow the same simple schema (= RDF Triple)
Resource Description Framework


Resource Description Framework
● RDF Statements (RDF-Triple):
+ Object / ValueSubject + Property
URI URI URI / Literal RDF Building Blocks
N-Triples Serialization
“95671” .
/Santiago_de_Compostela> <http://dbpedia.org/ontology/
“95671” .


Resource Description Framework
● URIs and Literals
○ URIs reference resources uniquely
○ Literals describe data values that don’t have a separate existence
“95671” .


RDF Schema
dbo:City rdf:type owl:class .
dbo:City rdfs:subClassOf
dbo:Settlement .
dbo:foundationPlace rdfs:range
City foundation
The Semantic Web Technology Stack


logical constraint
Spain Madrid
Small_town ∩ Capital = ∅
∀x. ( City(x)∧ seatOfGovernment(x) → Capital(x) )
description logics
+ logical rules
The Semantic Web Technology Stack


Look for a l l cities located i n the same area of
Santiago de Compostela (use the property
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT distinct ?area ?city
FROM <http://dbpedia.org/> WHERE{
?area dbp:subdivisionName dbr:Santiago_de_Compostela.
?area dbp:subdivisionName ?city.
The Semantic Web Technology Stack


Look fo r a l l cities located i n
the same area of Santiago de
Compostela (use the


Query language designed to use a syntax similar to SQL for retrieving
data from relational databases.
Different query forms:
• SELECT returns variables and their bindings directly.
• CONSTRUCT returns a single RDF graph specified by a graph template.
• ASK test whether or not a query pattern has a solution. Returns yes/no.
• DESCRIBE returns a single RDF graph containing RDF data about resources.
SPARQL – * Protocol and RDF Query Language


Based on relations (tables). Based on labelled directed
The relations (tables) to be
matched over should be
Assumes a default graph.
(The FROM clause populates this
with specific identified
(Retrieval) queries produce a
relation from a relation.
SPARQL SELECT queries produce a
relation from a graph.
CONSTRUCT queries (considered
later) produce a graph from a


The application of the Linked Data Principles leads to a ,Web of Data‘
>74B RDF Triples
808M Links
as of August 2014


The Development of the Web of Data
May 2007


The Development of the Web of Data
Nov 2007


The Development of the Web of Data


The Development of the Web of Data
July 2009


The Development of the Web of Data
Aug 2014


Linked Open Data
○ Public Linked Data resources in the Web, licensed as Creative Common CC-BY
○ Tim Berners-Lee‘s 5-Star Criteria for Linked Open Data
Available on the web (whatever format) but with an open licence, to be Open Data
Available as machine-readable structured data
(e.g. excel instead of image scan of a table)
as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★★ All the above, plus: link your data to other people’s data to provide context
★★★★ All the above plus: use open standards from W3C
(URI,RDF and SPARQL) to identify things, so that people can point at your stuff


Linked Open Data


December 2007
8 principles for the Open Government Data:
Primary (not aggregate)
Up to date
Machine processable
No license fees


Open data
Data can be published and
be publicly available under
an open licence without
linking to other data
Linked data
Data can be linked to URIs from
other data sources, using open
standards such as RDF without
being publicly available under an
open licence.
“Open data is data that can be freely used, reused and
redistributed by anyone – subject only, at most, to the
requirement to attribute and sharealike.”
- OpenDefinition.org
Cobden et al., A research agenda for Linked ClosedData
Linked Data vs open Data


• Flexible data integration: LOGD facilitates data integration and enables
the interconnection of previously disparate government datasets.
• Increase in data quality: The increased (re)use of LOGD triggers a growing
demand to improve data quality. Through crowd-sourcing and self-service
mechanisms, errors are progressively corrected.
• New services: The availability of LOGD gives rise to new services offered
by the public and/or private sector.
• Cost reduction: The reuse of LOGD in e-Government applications leads to
considerable cost reductions.
ISA Study on Business Models for LOGD
Linked (open) governament data


Key milestones for linked government data


Linked Data - A Guided Tour
● Datasets ordered
by category


● 183 datasets
● top 10 highest indegree: reference.data.gov.uk
● 48 proprietary vocabularies used
● c. 21% fully dereferencable
Every term in a LOD source must be
accessible via its URI through an HTTP
GET. Once we access the URI we found the
definition of the term.
The dereferencability quota of a LOD
source is define as the number of
dereferencable terms divided by all terms
collected into the source.
fully dereferencable LOD source – there
exist a definition for all URIs
partially dereferencable LOD source - for
some terms, but not for all, a definition
could be retrieved


● 22 datasets
● 22 proprietary vocabularies used
● 0% fully dereferencable
● 9% partially dereferencable


User Generated Content
● 48 datasets
● top 10 highest outdegree: semanticweb.org
● 30 proprietary vocabularies used
● 13% fully dereferencable
● 10% partially dereferencable


● no statistics available so far


Bibliographic Data
● 96 datasets
● top 10 highest indegree: data.semanticweb.org
● top 10 highest outdegree: bibsonomy.org
● 58 proprietary vocabularies used
● 21% fully dereferencable
● 7% partially dereferencable


● 83 datasets
● 35 proprietary vocabularies used
● 28% fully dereferencable
● 6% partially dereferencable
Life Sciences


Cross Domain
● 41 datasets
● top 10 highest indegree: dbpedia.org, w3.org,
● 55 proprietary vocabularies used
● 27% fully dereferencable
● 11% partially dereferencable


Social Networking
● 520 datasets
● top 10 highest indegree: quitter.se, status.net, …
● top 10 highest outdegree: deri.org, harth.org,...
● 128 proprietary vocabularies used
● 16% fully dereferencable
● 6% partially dereferencable


Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Insti
● 21 datasets
● top 10 highest indegree: geonames.org
● 24 proprietary vocabularies used
● 21% fully dereferencable
● 4% partially dereferencable


Linked Data Ontologies
● Ontologies hold the
Linked Data Cloud together
owl:sameAs connects identical
owl:equivalentClass connects
equivalent classes


Linked Data Ontologies
● Ontologies hold the
Linked Data Cloud together
○ „Simple Knowledge Organization System“
○ based on RDF and RDFS
○ applied for definitions and mappings of
vocabularies and ontologies
■ skos:Concept (classes)
■ skos:narrower
■ skos:broader
■ skos:related
■ skos:exactMatch (vacabulary)
■ skos:narrowMatch
■ skos:broadMatch
■ skos:relatedMatch


Linked Data Ontologies
● Ontologies hold the
Linked Data Cloud together
● umbel
○ „Upper Mapping and Binding Exchange
○ Subset of OpenCycas RDF Triples based on
○ Upper Ontology with 28.000 concepts
○ 46.000 Mappings into DBpedia,
geonames, e.a.
(owl:equivalentClass, rdfs:
○ Links to more than 2 Mio Wikipedia pages


Member State initiatives – some examples
Some examples on supra-national, national, regional and private initiatives in the
area of linked (open) data across Europe.
DE – Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria, Berlin and Brandenburg.
IT – Agenzia per l’Italia digitiale
Three datasets published as linked data: the Index of Public Administration, the SPC contracts for web
services and conduction systems and the Classifications for the data in Public Administration.
NL – Building and address register
The Dutch Address and Buildings base register published as linked data.
UK – Ordnance Survey
Three OS Open Data products published as linked data: the 1:50 000 Scale Gazetteer, Code-Point Open
and the administrative geography taken from Boundary Line.
UK – Companies House
Publishing basic company details as linked data
using a simple URI for each company in their database.
ISA Study on Business Models for LOGD


Linked Government Data & Metadata initiatives
funded by the European Commission


Linked Government Data Pilots


Non-governmental applications


• Linked data is a set of design principles for sharing machine-readable
data on the Web.
• Linked data and open data are not the same.
• URIs, RDF and SPARQL form the foundational layer for Linked data.
• Linked data offers a number of advantages for:
• Data integration with small impact on legacy systems;
• Enables for semantic interoperability;
• Enables creativity and innovation through context and knowledge- creation.


Group questions
Is there supply and demand for (Linked) Open
Government Data in your country?
What are, in your opinion, the expected benefits
and pitfalls of Linked Data?
Do you know if there are any Linked (Open) Data
initiatives in your country? If so, how many stars
would you give them?


Download the slide from
My research group website
On slide share


Some of the materials used in these slides have been rearranged from
- Slides of the “Knowledge Engineering with Semantic Web Technologies
2015” course held by Dott. Harald Sack
- Slides of the "Introduction to linked data" of Open Data Support
- Slides of "Usage of Linked Data Introduction and Application Scenarios «
and "Querying Linked Data" by Barry Norton, EUCLID project


Further readings
Linked Open Government Data. Li Ding Qualcomm, Vassilios Peristeras and Michael
EUCLID - Course 1: Introduction and Application Scenarios http://www.euclid-
Linked Open Data: The Essentials. Florian Bauer, Martin Kaltenböck.
Linked Data: Evolving the Web into a Global Data Space. Tom Heath and Christian Bizer.


LOD2 FP7 project, http://lod2.eu/
The Open Knowledge Foundation, http://okfn.org/
W3C Semantic Web, http://www.w3.org/standards/semanticweb/ EUCLID,
ISA Programme, http://ec.europa.eu/isa/
W3C LOGD WG, http://www.w3.org/2011/gld/wiki/Main_Page
LOD Around The Clock FP7 project, http://latc-project.eu/
Data.gov.uk, http://data.gov.uk/linked-data
Related projects and initiatives

