The document provides an overview of semantic integration using Apache Jena and Apache Stanbol. It discusses using semantic web technologies like RDF, ontologies, and vocabularies to integrate data from various sources and allow machines and people to better understand and use the integrated information. It also provides technical details on Apache Jena, which can store and query RDF data, and Apache Stanbol, a semantic processing engine that can enhance content with metadata.
1 of 27
More Related Content
Semantic Integration with Apache Jena and Stanbol
1. October 22, 2014 Fogbeam Labs
Semantic Integration with Apache
Jena and Apache Stanbol
Semantic Integration with Apache
Jena and Apache Stanbol
All Things OpenAll Things Open
Raleigh, NCRaleigh, NC
Oct. 22, 2014Oct. 22, 2014
3. October 22, 2014 Fogbeam Labs
What do we mean by
“Semantic Integration”?
What do we mean by
“Semantic Integration”?
● Integration, generallyIntegration, generally
● Letting things “talk to each other” so they can act as
a cohesive whole
Letting things “talk to each other” so they can act as
a cohesive whole
● Uses the Semantic Web technology stackUses the Semantic Web technology stack
● Data integration using RDF, well known vocabularies,
as well as in-house vocabularies and ontologies.
Data integration using RDF, well known vocabularies,
as well as in-house vocabularies and ontologies.
● Relationship to EAI, MDM, etc?Relationship to EAI, MDM, etc?
4. October 22, 2014 Fogbeam Labs
Uses Semantic Web technology
to do what, exactly?
Uses Semantic Web technology
to do what, exactly?
● Work with knowledge, not labelsWork with knowledge, not labels
● Express metadata about “things”Express metadata about “things”
● And the relationships between those “things” and their
characteristics
And the relationships between those “things” and their
characteristics
● Reason about those “things” in order to:Reason about those “things” in order to:
● Find contextually relevant informationFind contextually relevant information
● Search with greater precisionSearch with greater precision
● Generate new knowledgeGenerate new knowledge
● ??????
5. October 22, 2014 Fogbeam Labs
Knowledge?Knowledge?
● What's the difference between “Data”, “Information”,
“Knowledge”, etc?
What's the difference between “Data”, “Information”,
“Knowledge”, etc?
● Different ways of talking about this.Different ways of talking about this.
● DIKW Pyramid is a popular modelDIKW Pyramid is a popular model
● http://en.wikipedia.org/wiki/DIKW_Pyramidhttp://en.wikipedia.org/wiki/DIKW_Pyramid
7. October 22, 2014 Fogbeam Labs
Knowledge?Knowledge?
●
For our purposes today...For our purposes today...
●Unambigous IdentifiersUnambigous Identifiers
● OntologyOntology
●Type / Class informationType / Class information
●RelationshipsRelationships
8. October 22, 2014 Fogbeam Labs
Working With Knowledge instead of
Labels
Working With Knowledge instead of
Labels
● Backing up – what do we mean by “Semantic” anyway?Backing up – what do we mean by “Semantic” anyway?
● Is “Java”:Is “Java”:
● An island in the South PacificAn island in the South Pacific
● A slang word for coffeeA slang word for coffee
● A programming language invented by Sun MicrosystemsA programming language invented by Sun Microsystems
● Using URIs as labelsUsing URIs as labels
●
“java” we are talking about.which
In order to talk about “the semantics of Java” we have to
know unambiguously
In order to talk about “the semantics of Java” we have to
know unambiguously which “java” we are talking about.
9. October 22, 2014 Fogbeam Labs
OntologyOntology
● The attributes / properties of a ThingThe attributes / properties of a Thing
● Set membership of a ThingSet membership of a Thing
● rdfs:Classrdfs:Class
● Relationships between ThingsRelationships between Things
● dc:relationdc:relation
● dc:subjectdc:subject
● rdfs:subClassOfrdfs:subClassOf
● skos:narrower, skos:broaderskos:narrower, skos:broader
10. October 22, 2014 Fogbeam Labs
Data Table SlideData Table Slide
id color size manufacturer
2345 Blue Large Acme
2378 Red Small Cullet
3421 Green Medium Acme
12. October 22, 2014 Fogbeam Labs
Types & RelationshipsTypes & Relationships
● RDF/SRDF/S
● superclass / subclass relationships for Classessuperclass / subclass relationships for Classes
● superclass / subclass relationships for Propertiessuperclass / subclass relationships for Properties
● domain / range relationship between Properties and Classesdomain / range relationship between Properties and Classes
● OWLOWL
● class equivalenceclass equivalence
● entity equivalenceentity equivalence
● class disjointnessclass disjointness
● SKOSSKOS
● narrower / broader relationship between Conceptsnarrower / broader relationship between Concepts
● ordered collectionsordered collections
13. October 22, 2014 Fogbeam Labs
ButBut
● But... we're not here for a course on Epistemology or
Metaphysics...
But... we're not here for a course on Epistemology or
Metaphysics...
14. October 22, 2014 Fogbeam Labs
SynonymsSynonyms
● Smart DataSmart Data
● Semantic DataSemantic Data
● KnowledgeKnowledge
15. October 22, 2014 Fogbeam Labs
Semantic Integration LayerSemantic Integration Layer
Enterprise ApplicationsEnterprise Applications
(ERP, SFA,(ERP, SFA,
CRM, etc.)CRM, etc.)
Document RepositoriesDocument Repositories
DMS, Wikis, Blogs,DMS, Wikis, Blogs,
Forums, Etc.Forums, Etc.
“Big Data”“Big Data”
Data Warehouses,Data Warehouses,
Data Lakes, etc.Data Lakes, etc.
Internet of Things,Internet of Things,
M2M, Sensor DataM2M, Sensor Data
etc.etc.
“Open Data”“Open Data”
SEC filingsSEC filings
EPA dataEPA data
building permits,building permits,
etc.etc.
StanbolStanbol
JenaJena
UsersUsers
16. October 22, 2014 Fogbeam Labs
But wait, there's more...But wait, there's more...
● From relational database to Semantic Web -> R2RMLFrom relational database to Semantic Web -> R2RML
● D2RQD2RQ
● http://d2rq.orghttp://d2rq.org
● ANY23 – Anything to TriplesANY23 – Anything to Triples
● http://any23.apache.orghttp://any23.apache.org
● OpenRefine, Tika, JSoup, Boilerpipe, ...OpenRefine, Tika, JSoup, Boilerpipe, ...
● Potentially, anything that might be part of a normal ETL
workflow
Potentially, anything that might be part of a normal ETL
workflow
17. October 22, 2014 Fogbeam Labs
So, what is the Semantic
Web?
So, what is the Semantic
Web?
An evolving extension of the World Wide Web in which the semantics
of information and services on the web is defined, making it possible
for the web to understand and satisfy the requests of people and
machines to use the web content.
An evolving extension of the World Wide Web in which the semantics
of information and services on the web is defined, making it possible
for the web to understand and satisfy the requests of people and
machines to use the web content.
Sir Tim Berners-Lee's vision of the Web as a universal medium for data,
information, and knowledge exchange.
Sir Tim Berners-Lee's vision of the Web as a universal medium for data,
information, and knowledge exchange.
...prospective future possibilities that are yet to be implemented or
realized.
...prospective future possibilities that are yet to be implemented or
realized.
A set of design principles, collaborative working groups, and a variety of
enabling technologies.
A set of design principles, collaborative working groups, and a variety of
enabling technologies.
18. October 22, 2014 Fogbeam Labs
What is the Semantic Web?
(continued)
What is the Semantic Web?
(continued)
”
... supposed to make data located anywhere on the Web
accessible and understandable, both to people and to
machines.
““... supposed to make data located anywhere on the Web
accessible and understandable, both to people and to
machines.”
(Explorers Guide to the Semantic Web, p 3)(Explorers Guide to the Semantic Web, p 3)
”... more a vision than a technology.““... more a vision than a technology.”
(Explorers Guide to the Semantic Web, p 3)(Explorers Guide to the Semantic Web, p 3)
“...a fluid, evolving, informally defined concept rather than an
integrated, working system.”
“...a fluid, evolving, informally defined concept rather than an
integrated, working system.”
(Explorers Guide to the Semantic Web, p 3)(Explorers Guide to the Semantic Web, p 3)
19. October 22, 2014 Fogbeam Labs
The “Semantic Web Layer Cake”The “Semantic Web Layer Cake”
20. October 22, 2014 Fogbeam Labs
RDF – Resource Description FrameworkRDF – Resource Description Framework
● Resources unambiguously named using URIsResources unambiguously named using URIs
● Everything is a triple... ex: “the shoe is red” would be the triple with subject = “shoe”,
predicate (or property) = “color”, and object (or value = “red”
Everything is a triple... ex: “the shoe is red” would be the triple with subject = “shoe”,
predicate (or property) = “color”, and object (or value = “red”
● Serialization formats include XML (known as RDF/XML ) and developer friendly
serialization formats including N3, Turtle, and JSON-LD
Serialization formats include XML (known as RDF/XML ) and developer friendly
serialization formats including N3, Turtle, and JSON-LD
SubjectSubject PropertyProperty ValueValue
bjectSubject, Predicate, OSubject, Predicate, Object
Models statements as “triples”Models statements as “triples”
21. October 22, 2014 Fogbeam Labs
Reasoning over dataReasoning over data
● OWL / SKOS / etc.OWL / SKOS / etc.
● Ability to access “Inferred” triplesAbility to access “Inferred” triples
23. October 22, 2014 Fogbeam Labs
Querying with SPARQLQuerying with SPARQL
● Basic queriesBasic queries
● Using inferred triplesUsing inferred triples
● Federated QueriesFederated Queries
● DBPedia exampleDBPedia example
24. October 22, 2014 Fogbeam Labs
Semantic Integration in the
Enterprise
Semantic Integration in the
Enterprise
● Knowledge ManagementKnowledge Management
● CollaborationCollaboration
● BPMBPM
● Business IntelligenceBusiness Intelligence
● Predictive AnalyticsPredictive Analytics
25. October 22, 2014 Fogbeam Labs
Apache JenaApache Jena
● RDF APIRDF API
● Triplestore (TDB)Triplestore (TDB)
● Sparql Execution Engine (ARQ)Sparql Execution Engine (ARQ)
● OWL ReasonerOWL Reasoner
● SPARQL endpoint (Fuseki)SPARQL endpoint (Fuseki)
● Inference APIInference API
● Use built in reasonersUse built in reasoners
● Or define your own inference rulesOr define your own inference rules
● http://jena.apache.orghttp://jena.apache.org
27. October 22, 2014 Fogbeam Labs
Not AI, but...Not AI, but...
● Newer reasoners can utilize new techniques,
including Bayesian inference, any sort of machine
learning models, cognitive models, new NLP
techniques, etc.
Newer reasoners can utilize new techniques,
including Bayesian inference, any sort of machine
learning models, cognitive models, new NLP
techniques, etc.
● Same for Stanbol extraction – you can write your
own extractors and new extractors will be coming
down the pipe.
Same for Stanbol extraction – you can write your
own extractors and new extractors will be coming
down the pipe.