Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Introduction to Linked Data
Laura Po - Exploration, Visualization and Querying of Linked Open Data sources
2nd Keystone Training School - Keyword Search in Big Linked Data, University of Santiago de Compostela (USC), Spain.
Laura Po

2

Introduction to linked data

3

Objectives
By the end of this module you should have an understanding of
• What is linked data
• What is open data
• What is the difference between linked and open data
• How to publish linked data (5-star schema)
• What are the linked data principles and the linked data technologies
(the semantic web stack)
• The economic and social impact of linked data

4

Introduction to linked data

5

The Web of Data
The evolution from a Web of linked documents to a web of linked data
The Web as a huge decentralized database (knowledge base) of machine-
accessible data
Web of documents... Web of linked data...

6

The evolution of the web
• The Web started as a collection of documents
published online – accessible at Web location
identified by a URL.
• These documents often contain data about real-
world resources which is mainly human-readable
and cannot be understood by machines.
• The Web of Data is about enabling the access to
this data, by making it available in machine-
readable formats and connecting it using Uniform
Resource Identifiers (URIs), thus enabling people
and machines to collect the data, and put it
together to do all kinds of things with it (permitted
by the licence).
Machine-readable data (or
metadata) is data in a format that
can be interpreted by a computer.
2 types of machine-readable
data:
• human-readable data that is
marked upso that it can also
be understood by computers,
e.g. microformats, RDFa;
• data formats intended
principally for computers, e.g.
RDF, XML and JSON.

7

Linked Data and the ‘Web of Data‘
● Term refers to an idea originally from Tim Berners-Lee
(Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html)
● Set of best practices for publication and linking of
structured data on the web
● Basic assumption: The value of data on the web increases
when they are connected to other data sources
M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare.
net/mediasemanticweb/quick-linked-data-introduction
The Semantic Web isn't just
about putting data on the
web. It is about making
links, so that a person or
machine can explore the web
of data. With linked data,
when you have some of it,
you can find other, related,
data.

8

Defining linked data
“Linked data is a set of design principles for sharing
machine-readable data on the Web for use by public
administrations, business and citizens.”
EC ISA Case Study: How Linked Data is transforming eGovernment

9

Linked Data Principles
1. Use URIs as names for things.
2. Use HTTP URIs, so that people can look up those names.
3. When someone looks up a URI, provide useful information,
using the standards (RDF, SPARQL)
4. Include links to other URIs, so that they can discover more
things.

10

Introduction to linked data

11

How to get Data from the Web?
● Data can only be found on the Web, if it is available at some website
JDBC
Browser
Web Server
Database
HTTP

12

How to get Data from the Web?
● There is a number of different (proprietary) Web APIs, data exchange
formats and Mashups on top of that
Database 1 Database 2 Database 3 Database 4
Web
API 1
Web
API 2
Web
API 3
Web
API 4
Mashup

13

In the Web today...
● Data is locked up in small data islands
● Other applications usually cannot access this data...
Database
Database
Database
Database
Database
Database
Database
Database
Database
Database

14

Semantic Web Technologies , Dr. Harald Sack, Hasshttp://www.w3.org/2009/Talks/0204-ted-tbl/#(22)

15

How to get rid of Closed Data Islands?
Database 1 Database 2 Database 3 Database 4
● Apply Semantic Web technologies
○ to publish (structured) data on the web
○ to draw connections from one data source to data from other data sources
RDF data RDF data RDF data RDF data

16

Linked Data Principles (1/4)
1. Use URIs as names for things.
○ URIs do not only identify documents but also arbitrary objects
of the real world as well as abstract concepts
https://viaf.org/viaf/32197206/
http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart
http://musicbrainz.org/artist/20244d07-534f-4eff-b4d4-930878889970
http://www.imdb.com/title/tt3659388

17

Linked Data Principles (2/4)
2. Use HTTP URIs, so that people can look up those names.
○ HTTP URIs (URLs) as globally unique names enable
dereferencing of associated information in the Web
○ via http Content Negotiation machine and humans can
access the resource identified by the URI
RDF
Document
URI represents Designatum
http://dbpedia.org/resource/
Wolfgang_Amadeus_Mozart
http://dbpedia.org/page/
Wolfgang_Amadeus_Mozart
http://dbpedia.org/data/
Wolfgang_Amadeus_Mozart
URI represents Designator URI represents Designator
HTML
Document
FOR
MACHINE
FOR
HUMANS
Dereferencable
Every term in a LOD source
must be accessible via its URI
through an HTTP GET. Once
we access the URI we found
the definition of the term.

18

Linked Data Principles (3/4)
3. When someone looks up a URI, provide useful information, using the
standards (RDF, SPARQL)
○ RDF as universal data model for publishing structured data on the Web
○ Make all URIs in the RDF graph dereferenceable
○ Avoid RDF constructs that cause problems in Linked Data context
■ RDF Reification
■ RDF Collections und Containers
■ unnamed Blank Nodes

19

Linked Data Principles (4/4)
4. Include links to other URIs, so that they can discover more things.
○ Link RDF references among data between different data sources:
○ owl:sameAs –create a link between individuals
○ rdfs:seeAlso – states that a resource may provide additional information
○ Relationship Links
Links to external LOD Entitites related with the original entity
○ Identity Links
Links to external LOD Entities referring to the same object or concept
○ Vocabulary Links
Links to definitions of the original entity

20

Advantages of Linked Open Data vs. APIs
○ Simple and generic API for various heterogeneous data sources
enables simple reuse and data sharing among applications
○ RDF Data model guarantees (simple) extensibility
○ Transport via http, standard Port 80, prevents firewall adaption
○ Ontologies enable meaningful connections between data sources
○ Reasoning over Linked Data enables to generate new knowledge,
i.e. inference from implicit to explicit knowledge

21

Introduction to linked data

22

Introduction to linked data

23

The Semantic Web Technology Stack
http://dbpedia.org/resource/
Santiago_de_Compostela
Santiago de Compostela
URI - Uniform Resource Identifier

24

From Wikipedia to DBpedia
https://en.wikipedia.org/wiki/
Santiago_de_Compostela
http://dbpedia.org/resource/Santiago_de_Compostela

25

From Wikipedia to DBpedia
http://dbpedia.org/resource/Santiago_de_Compostela

26

RDF Resource Description Framework
:Santiago_de_Compostela rdf:type dbo:City .
:Santiago_de_Compostela dbo:country dbr:Spain .
:Santiago_de_Compostela owl:sameAs
geodata:Santiago di Compostela .
dbr:University_of_Santiago_de_Compostela
dbp:city dbr:Santiago_de_Compostela .
:Santiago_de_Compostela dbp:populationTotal
95671 (xsd:integer) .
...
:Santiago rdf:type dbo:City .
RDF Subject RDF Property RDF Object
RDF Triple
From Wikipedia to DBpedia
http://dbpedia.org/resource/Santiago_de_Compostela

27

● Resource
○ can be everything
○ must be uniquely identified and referencable via URI
● Description
○ = description of resources
○ via representing properties and relationships among resources as graphs
● Framework
○ = combination of web based protocolls (URI, HTTP, XML, Turtle, JSON, …)
○ based on formal model (semantics)
● Knowledge in RDF is expressed as a list of statements
● all RDF statements follow the same simple schema (= RDF Triple)
Resource Description Framework

28

Resource Description Framework
● RDF Statements (RDF-Triple):
+ Object / ValueSubject + Property
URI URI URI / Literal RDF Building Blocks
<http://dbpedia.org/resource
/Santiago_de_Compostela>
<http://dbpedia.org/ontology/
populationTotal>
N-Triples Serialization
“95671” .
graph
representation
<http://dbpedia.org/resource
/Santiago_de_Compostela> <http://dbpedia.org/ontology/
populationTotal>
“95671” .

29

Resource Description Framework
● URIs and Literals
○ URIs reference resources uniquely
○ Literals describe data values that don’t have a separate existence
<http://dbpedia.org/resource/Spain>
<http://dbpedia.org/ontology
/country>
<http://dbpedia.org/resource
/Santiago_de_Compostela>
<http://dbpedia.org/ontology
/populationTotal>
“95671” .

30

RDF Schema
dbo:City rdf:type owl:class .
dbo:City rdfs:subClassOf
dbo:Settlement .
dbo:foundationPlace rdfs:range
dbo:City.
...
City foundation
Place
Settlement
rdfs:isSubclassOf
The Semantic Web Technology Stack
http://dbpedia.org/ontology/City
rdfs:range

31

logical constraint
City
Spain Madrid
dbo:country
Small_town ∩ Capital = ∅
rdf:type
rdfs:isSubclassOf
∀x. ( City(x)∧ seatOfGovernment(x) → Capital(x) )
description logics
+ logical rules
classes
entities
The Semantic Web Technology Stack

32

Look for a l l cities located i n the same area of
Santiago de Compostela (use the property
dbp:subdivisionName)
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT distinct ?area ?city
FROM <http://dbpedia.org/> WHERE{
?area dbp:subdivisionName dbr:Santiago_de_Compostela.
?area dbp:subdivisionName ?city.
}
The Semantic Web Technology Stack
http://dbpedia.org/sparql

33

http://dbpedia.org/sparql
Look fo r a l l cities located i n
the same area of Santiago de
Compostela (use the
property
dbp:subdivisionName)

34

Query language designed to use a syntax similar to SQL for retrieving
data from relational databases.
Different query forms:
• SELECT returns variables and their bindings directly.
• CONSTRUCT returns a single RDF graph specified by a graph template.
• ASK test whether or not a query pattern has a solution. Returns yes/no.
• DESCRIBE returns a single RDF graph containing RDF data about resources.
SPARQL – * Protocol and RDF Query Language

35

SQL versus SPARQL
SQL SPARQL
Based on relations (tables). Based on labelled directed
graphs.
The relations (tables) to be
matched over should be
indicated.
Assumes a default graph.
(The FROM clause populates this
with specific identified
subgraphs).
(Retrieval) queries produce a
relation from a relation.
SPARQL SELECT queries produce a
relation from a graph.
CONSTRUCT queries (considered
later) produce a graph from a
graph.

36

Introduction to linked data

37

The application of the Linked Data Principles leads to a ,Web of Data‘
>1014Datasets
>74B RDF Triples
808M Links
as of August 2014

38

The Development of the Web of Data
May 2007

39

The Development of the Web of Data
Nov 2007

40

The Development of the Web of Data

41

The Development of the Web of Data
July 2009

42

The Development of the Web of Data
Aug 2014

43

Linked Open Data
○ Public Linked Data resources in the Web, licensed as Creative Common CC-BY
○ Tim Berners-Lee‘s 5-Star Criteria for Linked Open Data
★★
★★★
Available on the web (whatever format) but with an open licence, to be Open Data
Available as machine-readable structured data
(e.g. excel instead of image scan of a table)
as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★★ All the above, plus: link your data to other people’s data to provide context
★★★★ All the above plus: use open standards from W3C
(URI,RDF and SPARQL) to identify things, so that people can point at your stuff
★

44

Linked Open Data
http://5stardata.info/en/

45

Introduction to linked data

46

December 2007
8 principles for the Open Government Data:
Complete
Primary (not aggregate)
Up to date
Accessible
Machine processable
Non-discriminatory
Non-proprietary
No license fees
https://opengovdata.org/

47

Open data
Data can be published and
be publicly available under
an open licence without
linking to other data
sources.
Linked data
Data can be linked to URIs from
other data sources, using open
standards such as RDF without
being publicly available under an
open licence.
“Open data is data that can be freely used, reused and
redistributed by anyone – subject only, at most, to the
requirement to attribute and sharealike.”
- OpenDefinition.org
Seealso:
Cobden et al., A research agenda for Linked ClosedData
http://ceur-ws.org/Vol-782/CobdenEtAl_COLD2011.pdf
Linked Data vs open Data

48

• Flexible data integration: LOGD facilitates data integration and enables
the interconnection of previously disparate government datasets.
• Increase in data quality: The increased (re)use of LOGD triggers a growing
demand to improve data quality. Through crowd-sourcing and self-service
mechanisms, errors are progressively corrected.
• New services: The availability of LOGD gives rise to new services offered
by the public and/or private sector.
• Cost reduction: The reuse of LOGD in e-Government applications leads to
considerable cost reductions.
Seealso:
ISA Study on Business Models for LOGD
https://joinup.ec.europa.eu/community/semic/document/study-business-
models-linked-open-government-data-bm4logd
Linked (open) governament data

49

Key milestones for linked government data

50

Introduction to linked data

51

Linked Data - A Guided Tour
● Datasets ordered
by category
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/

52

Government
● 183 datasets
● top 10 highest indegree: reference.data.gov.uk
● 48 proprietary vocabularies used
● c. 21% fully dereferencable
Dereferencable
Every term in a LOD source must be
accessible via its URI through an HTTP
GET. Once we access the URI we found the
definition of the term.
The dereferencability quota of a LOD
source is define as the number of
dereferencable terms divided by all terms
collected into the source.
fully dereferencable LOD source – there
exist a definition for all URIs
partially dereferencable LOD source - for
some terms, but not for all, a definition
could be retrieved

53

Media
● 22 datasets
● 22 proprietary vocabularies used
● 0% fully dereferencable
● 9% partially dereferencable

54

User Generated Content
● 48 datasets
● top 10 highest outdegree: semanticweb.org
● 30 proprietary vocabularies used
● 13% fully dereferencable
● 10% partially dereferencable

55

Linguistics
● no statistics available so far

56

Bibliographic Data
● 96 datasets
● top 10 highest indegree: data.semanticweb.org
● top 10 highest outdegree: bibsonomy.org
● 58 proprietary vocabularies used
● 21% fully dereferencable
● 7% partially dereferencable

57

● 83 datasets
● 35 proprietary vocabularies used
● 28% fully dereferencable
● 6% partially dereferencable
Life Sciences

58

Cross Domain
● 41 datasets
● top 10 highest indegree: dbpedia.org, w3.org,
lexvo.org
● 55 proprietary vocabularies used
● 27% fully dereferencable
● 11% partially dereferencable

59

Social Networking
● 520 datasets
● top 10 highest indegree: quitter.se, status.net, …
● top 10 highest outdegree: deri.org, harth.org,...
● 128 proprietary vocabularies used
● 16% fully dereferencable
● 6% partially dereferencable

60

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Insti
Geographic
● 21 datasets
● top 10 highest indegree: geonames.org
● 24 proprietary vocabularies used
● 21% fully dereferencable
● 4% partially dereferencable

61

Linked Data Ontologies
● Ontologies hold the
Linked Data Cloud together
● OWL
owl:sameAs connects identical
individuals
owl:equivalentClass connects
equivalent classes

62

Linked Data Ontologies
● Ontologies hold the
Linked Data Cloud together
● SKOS
○ „Simple Knowledge Organization System“
○ based on RDF and RDFS
○ applied for definitions and mappings of
vocabularies and ontologies
■ skos:Concept (classes)
■ skos:narrower
■ skos:broader
■ skos:related
■ skos:exactMatch (vacabulary)
■ skos:narrowMatch
■ skos:broadMatch
■ skos:relatedMatch

63

Linked Data Ontologies
● Ontologies hold the
Linked Data Cloud together
● umbel
○ „Upper Mapping and Binding Exchange
Layer“
○ Subset of OpenCycas RDF Triples based on
SKOS and OWL2
○ Upper Ontology with 28.000 concepts
(skos:Concept)
○ 46.000 Mappings into DBpedia,
geonames, e.a.
(owl:equivalentClass, rdfs:
subClassOf)
○ Links to more than 2 Mio Wikipedia pages

64

Introduction to linked data

65

Member State initiatives – some examples
Some examples on supra-national, national, regional and private initiatives in the
area of linked (open) data across Europe.
DE – Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria, Berlin and Brandenburg.
IT – Agenzia per l’Italia digitiale
Three datasets published as linked data: the Index of Public Administration, the SPC contracts for web
services and conduction systems and the Classifications for the data in Public Administration.
NL – Building and address register
The Dutch Address and Buildings base register published as linked data.
UK – Ordnance Survey
Three OS Open Data products published as linked data: the 1:50 000 Scale Gazetteer, Code-Point Open
and the administrative geography taken from Boundary Line.
UK – Companies House
Publishing basic company details as linked data
using a simple URI for each company in their database.
Seealso:
ISA Study on Business Models for LOGD
https://joinup.ec.europa.eu/community/semic/document/study-business-
models-linked-open-government-data-bm4logd

66

Linked Government Data & Metadata initiatives
funded by the European Commission
ADMS.
SW
CORE
PUBLIC
SERVICE
VOCABULARY

67

Linked Government Data Pilots
http://health.testproject.
eu/PPP/
http://maritime.testproject.
eu/CISE/
http://cpsv.testproject.e
u/CPSV/

68

Non-governmental applications

69

Conclusion
• Linked data is a set of design principles for sharing machine-readable
data on the Web.
• Linked data and open data are not the same.
• URIs, RDF and SPARQL form the foundational layer for Linked data.
• Linked data offers a number of advantages for:
• Data integration with small impact on legacy systems;
• Enables for semantic interoperability;
• Enables creativity and innovation through context and knowledge- creation.

70

Group questions
Is there supply and demand for (Linked) Open
Government Data in your country?
What are, in your opinion, the expected benefits
and pitfalls of Linked Data?
Do you know if there are any Linked (Open) Data
initiatives in your country? If so, how many stars
would you give them?

71

Introduction to linked data

72

Download the slide from
My research group website
www.dbgroup.unimore.it
On slide share
http://www.slideshare.net/polaura

73

References
Some of the materials used in these slides have been rearranged from
- Slides of the “Knowledge Engineering with Semantic Web Technologies
2015” course held by Dott. Harald Sack
https://open.hpi.de/courses/semanticweb2015
- Slides of the "Introduction to linked data" of Open Data Support
http://www.slideshare.net/OpenDataSupport/introduction-to-linked-data-
23402165
- Slides of "Usage of Linked Data Introduction and Application Scenarios «
and "Querying Linked Data" by Barry Norton, EUCLID project

74

Further readings
Linked Open Government Data. Li Ding Qualcomm, Vassilios Peristeras and Michael
Hausenblas.
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6237454
EUCLID - Course 1: Introduction and Application Scenarios http://www.euclid-
project.eu/modules/course1
Linked Open Data: The Essentials. Florian Bauer, Martin Kaltenböck.
http://www.semantic-web.at/LOD-TheEssentials.pdf
Linked Data: Evolving the Web into a Global Data Space. Tom Heath and Christian Bizer.
http://linkeddatabook.com/editions/1.0/

75

LOD2 FP7 project, http://lod2.eu/
The Open Knowledge Foundation, http://okfn.org/
W3C Semantic Web, http://www.w3.org/standards/semanticweb/ EUCLID,
http://projecteuclid.org/
ISA Programme, http://ec.europa.eu/isa/
W3C LOGD WG, http://www.w3.org/2011/gld/wiki/Main_Page
LOD Around The Clock FP7 project, http://latc-project.eu/
Data.gov.uk, http://data.gov.uk/linked-data
Related projects and initiatives

More Related Content

Introduction to linked data

  • 1. Introduction to Linked Data Laura Po - Exploration, Visualization and Querying of Linked Open Data sources 2nd Keystone Training School - Keyword Search in Big Linked Data, University of Santiago de Compostela (USC), Spain. Laura Po
  • 3. Objectives By the end of this module you should have an understanding of • What is linked data • What is open data • What is the difference between linked and open data • How to publish linked data (5-star schema) • What are the linked data principles and the linked data technologies (the semantic web stack) • The economic and social impact of linked data
  • 5. The Web of Data The evolution from a Web of linked documents to a web of linked data The Web as a huge decentralized database (knowledge base) of machine- accessible data Web of documents... Web of linked data...
  • 6. The evolution of the web • The Web started as a collection of documents published online – accessible at Web location identified by a URL. • These documents often contain data about real- world resources which is mainly human-readable and cannot be understood by machines. • The Web of Data is about enabling the access to this data, by making it available in machine- readable formats and connecting it using Uniform Resource Identifiers (URIs), thus enabling people and machines to collect the data, and put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata) is data in a format that can be interpreted by a computer. 2 types of machine-readable data: • human-readable data that is marked upso that it can also be understood by computers, e.g. microformats, RDFa; • data formats intended principally for computers, e.g. RDF, XML and JSON.
  • 7. Linked Data and the ‘Web of Data‘ ● Term refers to an idea originally from Tim Berners-Lee (Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html) ● Set of best practices for publication and linking of structured data on the web ● Basic assumption: The value of data on the web increases when they are connected to other data sources M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare. net/mediasemanticweb/quick-linked-data-introduction The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.
  • 8. Defining linked data “Linked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations, business and citizens.” EC ISA Case Study: How Linked Data is transforming eGovernment
  • 9. Linked Data Principles 1. Use URIs as names for things. 2. Use HTTP URIs, so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4. Include links to other URIs, so that they can discover more things.
  • 11. How to get Data from the Web? ● Data can only be found on the Web, if it is available at some website JDBC Browser Web Server Database HTTP
  • 12. How to get Data from the Web? ● There is a number of different (proprietary) Web APIs, data exchange formats and Mashups on top of that Database 1 Database 2 Database 3 Database 4 Web API 1 Web API 2 Web API 3 Web API 4 Mashup
  • 13. In the Web today... ● Data is locked up in small data islands ● Other applications usually cannot access this data... Database Database Database Database Database Database Database Database Database Database
  • 14. Semantic Web Technologies , Dr. Harald Sack, Hasshttp://www.w3.org/2009/Talks/0204-ted-tbl/#(22)
  • 15. How to get rid of Closed Data Islands? Database 1 Database 2 Database 3 Database 4 ● Apply Semantic Web technologies ○ to publish (structured) data on the web ○ to draw connections from one data source to data from other data sources RDF data RDF data RDF data RDF data
  • 16. Linked Data Principles (1/4) 1. Use URIs as names for things. ○ URIs do not only identify documents but also arbitrary objects of the real world as well as abstract concepts https://viaf.org/viaf/32197206/ http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart http://musicbrainz.org/artist/20244d07-534f-4eff-b4d4-930878889970 http://www.imdb.com/title/tt3659388
  • 17. Linked Data Principles (2/4) 2. Use HTTP URIs, so that people can look up those names. ○ HTTP URIs (URLs) as globally unique names enable dereferencing of associated information in the Web ○ via http Content Negotiation machine and humans can access the resource identified by the URI RDF Document URI represents Designatum http://dbpedia.org/resource/ Wolfgang_Amadeus_Mozart http://dbpedia.org/page/ Wolfgang_Amadeus_Mozart http://dbpedia.org/data/ Wolfgang_Amadeus_Mozart URI represents Designator URI represents Designator HTML Document FOR MACHINE FOR HUMANS Dereferencable Every term in a LOD source must be accessible via its URI through an HTTP GET. Once we access the URI we found the definition of the term.
  • 18. Linked Data Principles (3/4) 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) ○ RDF as universal data model for publishing structured data on the Web ○ Make all URIs in the RDF graph dereferenceable ○ Avoid RDF constructs that cause problems in Linked Data context ■ RDF Reification ■ RDF Collections und Containers ■ unnamed Blank Nodes
  • 19. Linked Data Principles (4/4) 4. Include links to other URIs, so that they can discover more things. ○ Link RDF references among data between different data sources: ○ owl:sameAs –create a link between individuals ○ rdfs:seeAlso – states that a resource may provide additional information ○ Relationship Links Links to external LOD Entitites related with the original entity ○ Identity Links Links to external LOD Entities referring to the same object or concept ○ Vocabulary Links Links to definitions of the original entity
  • 20. Advantages of Linked Open Data vs. APIs ○ Simple and generic API for various heterogeneous data sources enables simple reuse and data sharing among applications ○ RDF Data model guarantees (simple) extensibility ○ Transport via http, standard Port 80, prevents firewall adaption ○ Ontologies enable meaningful connections between data sources ○ Reasoning over Linked Data enables to generate new knowledge, i.e. inference from implicit to explicit knowledge
  • 23. The Semantic Web Technology Stack http://dbpedia.org/resource/ Santiago_de_Compostela Santiago de Compostela URI - Uniform Resource Identifier
  • 24. From Wikipedia to DBpedia https://en.wikipedia.org/wiki/ Santiago_de_Compostela http://dbpedia.org/resource/Santiago_de_Compostela
  • 25. From Wikipedia to DBpedia http://dbpedia.org/resource/Santiago_de_Compostela
  • 26. RDF Resource Description Framework :Santiago_de_Compostela rdf:type dbo:City . :Santiago_de_Compostela dbo:country dbr:Spain . :Santiago_de_Compostela owl:sameAs geodata:Santiago di Compostela . dbr:University_of_Santiago_de_Compostela dbp:city dbr:Santiago_de_Compostela . :Santiago_de_Compostela dbp:populationTotal 95671 (xsd:integer) . ... :Santiago rdf:type dbo:City . RDF Subject RDF Property RDF Object RDF Triple From Wikipedia to DBpedia http://dbpedia.org/resource/Santiago_de_Compostela
  • 27. ● Resource ○ can be everything ○ must be uniquely identified and referencable via URI ● Description ○ = description of resources ○ via representing properties and relationships among resources as graphs ● Framework ○ = combination of web based protocolls (URI, HTTP, XML, Turtle, JSON, …) ○ based on formal model (semantics) ● Knowledge in RDF is expressed as a list of statements ● all RDF statements follow the same simple schema (= RDF Triple) Resource Description Framework
  • 28. Resource Description Framework ● RDF Statements (RDF-Triple): + Object / ValueSubject + Property URI URI URI / Literal RDF Building Blocks <http://dbpedia.org/resource /Santiago_de_Compostela> <http://dbpedia.org/ontology/ populationTotal> N-Triples Serialization “95671” . graph representation <http://dbpedia.org/resource /Santiago_de_Compostela> <http://dbpedia.org/ontology/ populationTotal> “95671” .
  • 29. Resource Description Framework ● URIs and Literals ○ URIs reference resources uniquely ○ Literals describe data values that don’t have a separate existence <http://dbpedia.org/resource/Spain> <http://dbpedia.org/ontology /country> <http://dbpedia.org/resource /Santiago_de_Compostela> <http://dbpedia.org/ontology /populationTotal> “95671” .
  • 30. RDF Schema dbo:City rdf:type owl:class . dbo:City rdfs:subClassOf dbo:Settlement . dbo:foundationPlace rdfs:range dbo:City. ... City foundation Place Settlement rdfs:isSubclassOf The Semantic Web Technology Stack http://dbpedia.org/ontology/City rdfs:range
  • 31. logical constraint City Spain Madrid dbo:country Small_town ∩ Capital = ∅ rdf:type rdfs:isSubclassOf ∀x. ( City(x)∧ seatOfGovernment(x) → Capital(x) ) description logics + logical rules classes entities The Semantic Web Technology Stack
  • 32. Look for a l l cities located i n the same area of Santiago de Compostela (use the property dbp:subdivisionName) PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX dbp: <http://dbpedia.org/property/> PREFIX dbr: <http://dbpedia.org/resource/> SELECT distinct ?area ?city FROM <http://dbpedia.org/> WHERE{ ?area dbp:subdivisionName dbr:Santiago_de_Compostela. ?area dbp:subdivisionName ?city. } The Semantic Web Technology Stack http://dbpedia.org/sparql
  • 33. http://dbpedia.org/sparql Look fo r a l l cities located i n the same area of Santiago de Compostela (use the property dbp:subdivisionName)
  • 34. Query language designed to use a syntax similar to SQL for retrieving data from relational databases. Different query forms: • SELECT returns variables and their bindings directly. • CONSTRUCT returns a single RDF graph specified by a graph template. • ASK test whether or not a query pattern has a solution. Returns yes/no. • DESCRIBE returns a single RDF graph containing RDF data about resources. SPARQL – * Protocol and RDF Query Language
  • 35. SQL versus SPARQL SQL SPARQL Based on relations (tables). Based on labelled directed graphs. The relations (tables) to be matched over should be indicated. Assumes a default graph. (The FROM clause populates this with specific identified subgraphs). (Retrieval) queries produce a relation from a relation. SPARQL SELECT queries produce a relation from a graph. CONSTRUCT queries (considered later) produce a graph from a graph.
  • 37. The application of the Linked Data Principles leads to a ,Web of Data‘ >1014Datasets >74B RDF Triples 808M Links as of August 2014
  • 38. The Development of the Web of Data May 2007
  • 39. The Development of the Web of Data Nov 2007
  • 40. The Development of the Web of Data
  • 41. The Development of the Web of Data July 2009
  • 42. The Development of the Web of Data Aug 2014
  • 43. Linked Open Data ○ Public Linked Data resources in the Web, licensed as Creative Common CC-BY ○ Tim Berners-Lee‘s 5-Star Criteria for Linked Open Data ★★ ★★★ Available on the web (whatever format) but with an open licence, to be Open Data Available as machine-readable structured data (e.g. excel instead of image scan of a table) as (2) plus non-proprietary format (e.g. CSV instead of excel) ★★★★★ All the above, plus: link your data to other people’s data to provide context ★★★★ All the above plus: use open standards from W3C (URI,RDF and SPARQL) to identify things, so that people can point at your stuff ★
  • 46. December 2007 8 principles for the Open Government Data: Complete Primary (not aggregate) Up to date Accessible Machine processable Non-discriminatory Non-proprietary No license fees https://opengovdata.org/
  • 47. Open data Data can be published and be publicly available under an open licence without linking to other data sources. Linked data Data can be linked to URIs from other data sources, using open standards such as RDF without being publicly available under an open licence. “Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.” - OpenDefinition.org Seealso: Cobden et al., A research agenda for Linked ClosedData http://ceur-ws.org/Vol-782/CobdenEtAl_COLD2011.pdf Linked Data vs open Data
  • 48. • Flexible data integration: LOGD facilitates data integration and enables the interconnection of previously disparate government datasets. • Increase in data quality: The increased (re)use of LOGD triggers a growing demand to improve data quality. Through crowd-sourcing and self-service mechanisms, errors are progressively corrected. • New services: The availability of LOGD gives rise to new services offered by the public and/or private sector. • Cost reduction: The reuse of LOGD in e-Government applications leads to considerable cost reductions. Seealso: ISA Study on Business Models for LOGD https://joinup.ec.europa.eu/community/semic/document/study-business- models-linked-open-government-data-bm4logd Linked (open) governament data
  • 49. Key milestones for linked government data
  • 51. Linked Data - A Guided Tour ● Datasets ordered by category http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
  • 52. Government ● 183 datasets ● top 10 highest indegree: reference.data.gov.uk ● 48 proprietary vocabularies used ● c. 21% fully dereferencable Dereferencable Every term in a LOD source must be accessible via its URI through an HTTP GET. Once we access the URI we found the definition of the term. The dereferencability quota of a LOD source is define as the number of dereferencable terms divided by all terms collected into the source. fully dereferencable LOD source – there exist a definition for all URIs partially dereferencable LOD source - for some terms, but not for all, a definition could be retrieved
  • 53. Media ● 22 datasets ● 22 proprietary vocabularies used ● 0% fully dereferencable ● 9% partially dereferencable
  • 54. User Generated Content ● 48 datasets ● top 10 highest outdegree: semanticweb.org ● 30 proprietary vocabularies used ● 13% fully dereferencable ● 10% partially dereferencable
  • 55. Linguistics ● no statistics available so far
  • 56. Bibliographic Data ● 96 datasets ● top 10 highest indegree: data.semanticweb.org ● top 10 highest outdegree: bibsonomy.org ● 58 proprietary vocabularies used ● 21% fully dereferencable ● 7% partially dereferencable
  • 57. ● 83 datasets ● 35 proprietary vocabularies used ● 28% fully dereferencable ● 6% partially dereferencable Life Sciences
  • 58. Cross Domain ● 41 datasets ● top 10 highest indegree: dbpedia.org, w3.org, lexvo.org ● 55 proprietary vocabularies used ● 27% fully dereferencable ● 11% partially dereferencable
  • 59. Social Networking ● 520 datasets ● top 10 highest indegree: quitter.se, status.net, … ● top 10 highest outdegree: deri.org, harth.org,... ● 128 proprietary vocabularies used ● 16% fully dereferencable ● 6% partially dereferencable
  • 60. Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Insti Geographic ● 21 datasets ● top 10 highest indegree: geonames.org ● 24 proprietary vocabularies used ● 21% fully dereferencable ● 4% partially dereferencable
  • 61. Linked Data Ontologies ● Ontologies hold the Linked Data Cloud together ● OWL owl:sameAs connects identical individuals owl:equivalentClass connects equivalent classes
  • 62. Linked Data Ontologies ● Ontologies hold the Linked Data Cloud together ● SKOS ○ „Simple Knowledge Organization System“ ○ based on RDF and RDFS ○ applied for definitions and mappings of vocabularies and ontologies ■ skos:Concept (classes) ■ skos:narrower ■ skos:broader ■ skos:related ■ skos:exactMatch (vacabulary) ■ skos:narrowMatch ■ skos:broadMatch ■ skos:relatedMatch
  • 63. Linked Data Ontologies ● Ontologies hold the Linked Data Cloud together ● umbel ○ „Upper Mapping and Binding Exchange Layer“ ○ Subset of OpenCycas RDF Triples based on SKOS and OWL2 ○ Upper Ontology with 28.000 concepts (skos:Concept) ○ 46.000 Mappings into DBpedia, geonames, e.a. (owl:equivalentClass, rdfs: subClassOf) ○ Links to more than 2 Mio Wikipedia pages
  • 65. Member State initiatives – some examples Some examples on supra-national, national, regional and private initiatives in the area of linked (open) data across Europe. DE – Bibliotheksverbund Bayern Linked data from 180 academic libraries in Bavaria, Berlin and Brandenburg. IT – Agenzia per l’Italia digitiale Three datasets published as linked data: the Index of Public Administration, the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration. NL – Building and address register The Dutch Address and Buildings base register published as linked data. UK – Ordnance Survey Three OS Open Data products published as linked data: the 1:50 000 Scale Gazetteer, Code-Point Open and the administrative geography taken from Boundary Line. UK – Companies House Publishing basic company details as linked data using a simple URI for each company in their database. Seealso: ISA Study on Business Models for LOGD https://joinup.ec.europa.eu/community/semic/document/study-business- models-linked-open-government-data-bm4logd
  • 66. Linked Government Data & Metadata initiatives funded by the European Commission ADMS. SW CORE PUBLIC SERVICE VOCABULARY
  • 67. Linked Government Data Pilots http://health.testproject. eu/PPP/ http://maritime.testproject. eu/CISE/ http://cpsv.testproject.e u/CPSV/
  • 69. Conclusion • Linked data is a set of design principles for sharing machine-readable data on the Web. • Linked data and open data are not the same. • URIs, RDF and SPARQL form the foundational layer for Linked data. • Linked data offers a number of advantages for: • Data integration with small impact on legacy systems; • Enables for semantic interoperability; • Enables creativity and innovation through context and knowledge- creation.
  • 70. Group questions Is there supply and demand for (Linked) Open Government Data in your country? What are, in your opinion, the expected benefits and pitfalls of Linked Data? Do you know if there are any Linked (Open) Data initiatives in your country? If so, how many stars would you give them?
  • 72. Download the slide from My research group website www.dbgroup.unimore.it On slide share http://www.slideshare.net/polaura
  • 73. References Some of the materials used in these slides have been rearranged from - Slides of the “Knowledge Engineering with Semantic Web Technologies 2015” course held by Dott. Harald Sack https://open.hpi.de/courses/semanticweb2015 - Slides of the "Introduction to linked data" of Open Data Support http://www.slideshare.net/OpenDataSupport/introduction-to-linked-data- 23402165 - Slides of "Usage of Linked Data Introduction and Application Scenarios « and "Querying Linked Data" by Barry Norton, EUCLID project
  • 74. Further readings Linked Open Government Data. Li Ding Qualcomm, Vassilios Peristeras and Michael Hausenblas. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6237454 EUCLID - Course 1: Introduction and Application Scenarios http://www.euclid- project.eu/modules/course1 Linked Open Data: The Essentials. Florian Bauer, Martin Kaltenböck. http://www.semantic-web.at/LOD-TheEssentials.pdf Linked Data: Evolving the Web into a Global Data Space. Tom Heath and Christian Bizer. http://linkeddatabook.com/editions/1.0/
  • 75. LOD2 FP7 project, http://lod2.eu/ The Open Knowledge Foundation, http://okfn.org/ W3C Semantic Web, http://www.w3.org/standards/semanticweb/ EUCLID, http://projecteuclid.org/ ISA Programme, http://ec.europa.eu/isa/ W3C LOGD WG, http://www.w3.org/2011/gld/wiki/Main_Page LOD Around The Clock FP7 project, http://latc-project.eu/ Data.gov.uk, http://data.gov.uk/linked-data Related projects and initiatives