1
EDUTELLA: P2P Networking for the Semantic Web
Wolfgang Nejdla , Boris Wolfa , Wolf Siberskia , Changtao Qua , Stefan Deckerb , Michael Sintekc ,
Ambjörn Naeved , Mikael Nilssond , Matthias Palmérd , Tore Rische
a
Learning Lab Lower Saxony, University of Hannover, Germany
b
USC Information Sciences Institute, Marina del Rey, CA, USA
c
DFKI GmbH, Kaiserslautern, Germany
d
e
Centre for user oriented IT Design, Royal Institute of Technology, Stockholm, Sweden
Department of Information Science, Uppsala University, Sweden
Metadata for the World Wide Web is important, but metadata for Peer-to-Peer (P2P) networks is absolutely
crucial. In this paper we discuss the open source project Edutella which builds upon metadata standards defined
for the WWW and aims to provide an RDF-based metadata infrastructure for P2P applications. Edutella is the
first system which brings together RDF and P2P concepts and exploits their strengths in a common framework,
suitable for building general schema-based P2P networks for distributed and dynamic information providers. We
describe the goals and main services this infrastructure will provide and the architecture to connect Edutella Peers
based on exchange of RDF metadata. As the query service is one of the core services of Edutella, upon which other
services are built, we specify in detail the Edutella Common Data Model (ECDM) as basis for the Edutella query
exchange language (RDF-QEL-i) and format implementing distributed queries over the Edutella network. Finally,
we shortly discuss registration and mediation services, and introduce the prototype and application scenario for
our current Edutella aware peers.
1. Introduction
While in the server/client-based environment
of the World Wide Web metadata are useful and
important, for Peer-to-Peer (P2P) environments
metadata are absolutely crucial. Information Resources in P2P networks are no longer organized
in hypertext like structures, which can be navigated, but are stored on numerous peers waiting to be queried for these resources if we know
what we want to retrieve and which peer is able
to provide that information. Querying peers requires metadata describing the resources managed by these peers, which is easy to provide for
specialized cases, but non-trivial for general applications.
P2P applications have been successful for special cases like exchanging music files. However,
retrieving “all recent songs by Madonna” does not
need complex query languages nor complex metadata, so special purpose formats for these P2P ap-
plications have been sufficient. In other scenarios,
like exchanging educational resources, queries are
more complex, and have to build upon standards
like IEEE-LOM/IMS [1,2] metadata with up to
100 metadata entries, which might even be complemented by domain specific extensions.
Furthermore, by concentrating on domain specific formats, current P2P implementations appear to be fragmenting into niche markets instead
of developing unifying mechanisms for future P2P
applications. There is indeed a great danger (as
already discussed in [3]), that unifying interfaces
and protocols introduced by the World Wide Web
get lost in the forthcoming P2P arena.
The Edutella project [4] addresses these shortcomings of current P2P applications by building
on the W3C metadata standard RDF [5,6]. The
project is a multi-staged effort to scope, specify,
architect and implement an RDF-based metadata
infrastructure for P2P-networks based on the recently announced JXTA framework [7]. The ini-
2
tial Edutella services will be Query Service (standardized query and retrieval of RDF metadata),
Replication Service (providing data persistence /
availability and workload balancing while maintaining data integrity and consistency), Mapping
Service (translating between different metadata
vocabularies to enable interoperability between
different peers), Mediation Service (define views
that join data from different meta-data sources
and reconcile conflicting and overlapping information) and Annotation Service (annotate materials stored anywhere within the Edutella Network).
Our vision is to provide the metadata services
needed to enable interoperability between heterogeneous JXTA applications. Our first application
will focus on a P2P network for the exchange of
educational resources (using schemas like IEEE
LOM, IMS, and ADL SCORM [8] to describe
course materials), other application areas will follow.
In Sections 2 and 3 we describe the background
and framework of the Edutella architecture and
our educational application scenario. Then, as
the query service is one of the core services of
Edutella, upon which other services are built,
we specify in detail in Section 4 the Edutella
common data model (ECDM) as basis for the
Edutella query exchange language and format implementing distributed queries over the Edutella
network. Finally, we sketch translations from the
Edutella CDM to different query languages (Section 5), shortly discuss registration and mediation
services (Section 6), and introduce the prototype
and application scenario for our current Edutella
aware peers (Section 7).
by SOAP), JXTA provides additional P2P protocols and services, including peer discovery, peer
groups, peer pipes, and peer monitors. Therefore
JXTA is a very useful framework for prototyping
and developing P2P applications.
JXTA
Applications
JXTA
Services
JXTA
Core
Sun
JXTA
Applications
JXTA Community Applications
JXTA Community Services
Peer Groups
Sun
JXTA
Services
Peer Pipes
- Indexing
- Searching
- File Sharing
JXTA
Shell
Peer
Commands
Peer Monitoring
Security
Any Peer on the Expanded Web
Figure 1. JXTA Layers
2. Background
This layered approach fits very nicely into our
application scenarios defined for Edutella:
Edutella Services (described in web service
languages like DAML-S or WSDL, etc.) complement the JXTA Service Layer, building upon the
JXTA Core Layer, and
Edutella Peers live on the Application Layer,
using the functionality provided by these Edutella
services as well as possibly other JXTA services.
On the Edutella Service layer, we define data
exchange formats and protocols (how to exchange
queries, query results and other metadata between Edutella Peers), as well as APIs for advanced functionality in a library-like manner. Applications like repositories, annotation tools or
GUI interfaces connected to and accessing the
Edutella network are implemented on the application layer.
2.1. The JXTA P2P Framework
JXTA is an Open Source project [9,7] supported and managed by Sun Microsystems. In
essence, JXTA is a set of XML based protocols
[10] to cover typical P2P functionality. It provides a Java binding offering a layered approach
for creating P2P applications (core, services, applications, see Figure 1, reproduced from [7]). In
addition to remote service access (such as offered
2.2. Educational Context
Every single university usually has already a
large pool of educational resources distributed
over its institutions. These are under control of
the single entities or individuals, and it is unlikely
that these entities will give up their control, which
explains why all approaches for the distribution
of educational media based on central repositories
have failed so far. Furthermore, setting up and
3
maintaining central servers is costly. The costs
are hardly justifiable, since a server distributing
educational material would not directly benefit
the sponsoring university.
We believe, that in order to really facilitate
the exchange of educational media, approaches
based on metadata-enhanced peer-to-peer (P2P)
networks are necessary.
In a typical P2P-based e-learning scenario,
each university acts not only as content provider
but also as content consumer, including local annotation of resources produced at other sites. As
content provider in a P2P network they will not
loose their control over their learning resources
but still provide them for use within the network.
As a content consumer both teachers and students benefit from having access not only to a
local repository, but to a whole network, using
queries over the metadata distributed within the
network to retrieve required resources.
P2P networks have already been quite successful for exchanging data in heterogeneous environments, and have been brought into focus with
services like Napster and Gnutella, providing access to distributed resources like MP3 coded audio data. However, pure Napster and Gnutella
like approaches are not suitable for the exchange
of educational media. For example, the metadata in Gnutella is limited to a file name and a
path. While this might work for files with titles
like “Madonna - Like a Virgin”, it certainly does
not work for “Introduction to Algebra - Lecture
23”. Furthermore, these special purpose services
lead to fragmented communities which use special
purpose clients to access their service.
The educational domain is in need of a much
richer metadata markup of resources, a markup
that is often highly domain and resource type specific. In order to facilitate interoperability and
reusability of educational resources, we need to
build a system supporting a wide range of such
resources. This places high demands on the interchange protocols and metadata schemata used in
such a system, as well as on the overall technical
structure. Also, we do not want to create yet another special purpose solution which is outdated
as soon as metadata requirements and definitions
change.
Our metadata based peer to peer system has
therefore to be able to integrate heterogeneous
peers (using different repositories, query languages and functionalities) as well as different
kinds of metadata schemas. We find common
ground in the essential assumption that all resources maintained in the Edutella network can
be described in RDF, and all functionality in the
Edutella network is mediated through RDF statements and queries on them. For the local user,
the Edutella network transparently provides access to distributed information resources, and different clients/peers can be used to access these
resources. Each peer will be required to offer a
number of basic services and may offer additional
advanced services.
3. Edutella Services
Edutella connects highly heterogeneous peers
(heterogeneous in their uptime, performance,
storage size, functionality, number of users etc.).
However, each Edutella peer can make its metadata information available as a set of RDF statements. Our goal is to make the distributed nature of the individual RDF peers connected to
the Edutella network completely transparent by
specifying and implementing a set of Edutella services. Each peer will be characterized by the set
of services it offers.
Query Service. The Edutella query service
is the most basic service within the Edutella
network and will be described in more detail
in the second part of this paper. Peers register the queries they may be asked through the
query service (i.e. by specifying supported metadata schemas (e.g., “this peer provides metadata according to the LOM 6.1 or DCMI standards”) or by specifying individual properties or
even values for these properties (e.g., “this peer
provides metadata of the form dc title(X,Y)”
or “this peer provides metadata of the form
dc title(X,’Artificial Intelligence’)”). Queries are
sent through the Edutella network to the subset
of peers who have registered with the service to
be interested in this kind of query. The resulting
RDF statements / models are sent back to the
requesting peer.
4
Edutella Replication. This service is complementing local storage by replicating data in
additional peers to achieve data persistence /
availability and workload balancing while maintaining data integrity and consistency. Since
Edutella is mainly concerned with metadata,
replication of metadata is our initial focus. Replication of data might be an additional possibility
(though this complicates synchronization of updates).
Edutella Mapping, Mediation, Clustering While groups of peers will usually agree
on using a common schema (e.g., SCORM or
IMS/LOM for educational resources), extensions
or variations might be needed in some locations. The Edutella Mapping service will be able
to manage mappings between different schemata
and use these mappings to translate queries over
one schema X to queries over another schema Y.
Mapping services will also provide interoperation
between RDF- and XML-based repositories. Mediation services actively mediate access between
different services, clustering services use semantic information to set up semantic routing and
semantic clusters.
4. Edutella Query Service
The Edutella Query Service is intended to be a
standardized query exchange mechanism for RDF
metadata stored in distributed RDF repositories
and is meant to serve as both query interface
for individual RDF repositories located at single
Edutella peers as well as query interface for distributed queries spanning multiple RDF repositories. An RDF repository (or knowledge base) consists of RDF statements (or facts) and describes
metadata according to arbitrary RDFS schemas.
One of the main purposes is to abstract
from various possible RDF storage layer query
languages (e.g. SQL) and from different user
level query languages (e.g. RQL, TRIPLE): The
Edutella Query Exchange Language and the
Edutella common data model provide the syntax and semantics for an overall standard query
interface across heterogeneous peer repositories
for any kind of RDF metadata. The Edutella
network uses the query exchange language fam-
ily RDF-QEL-i (based on Datalog semantics
and subsets thereof) as standardized query exchange language format which is transmitted in
an RDF/XML-format.
Software Engineering
dc:title
http://www.xyz.com/sw.html
rdf:type
http://www.lit.edu/types#Book
rdf:type
Artificial Intelligence
dc:title
http://www.xyz.com/ai.html
http://www.lit.edu/types#AI-Book
rdf:type
Prolog
dc:title
http://www.xyz.com/pl.html
Figure 2. Knowledge Base as RDF Graph
We will start with a simple RDF knowledge
base and a simple query on this knowledge base
depicted in Figure 2, with the following RDF
XML Serialization 1 :
<lib:Book about="http://www.xyz.com/sw.html">
<dc:title>Software Engineering</dc:title>
</lib:Book>
<lib:Book about="http://www.xyz.com/ai.html">
<dc:title>Artificial Intelligence</dc:title>
</lib:Book>
<lib:AI-Book about="http://www.xyz.com/pl.html">
<dc:title>Prolog</dc:title>
</lib:AI-Book>
Evaluating the following query (plain English)
“Return all resources that are a book
having the title ’Artificial Intelligence’
or that are an AI book.”
we get the query results shown in Figure 3, depicted as RDF-graph.
1 using
lib
as
namespace
’http://www.lit.edu/types#’.
shorthand
for
http://www.xyz.com/ai.html
Application
format
ECDM
JXTA Peer
Network
RDF/XML
Provider
Provider
Provider
Knowledge
Base
Edutella
Wrapper
Peer
Application
Edutella
Provider
Interface
rdf:type
Artificial Intelligence
dc:title
Edutella
Consumer
Interface
http://www.lit.edu/types#Book
Edutella
Wrapper
5
ECDM
Repository
format
http://www.lit.edu/types#AI-Book
rdf:type
http://www.xyz.com/pl.html
Figure 3. Query Results as RDF Graph
4.1. Query Exchange Architecture
Edutella peers are highly heterogeneous in
terms of the functionality (i.e. services) they offer.
A simple peer has RDF storage capability only.
The peer has some kind of local storage for RDF
triples (e.g., a relational database) as well as some
kind of local query language (e.g. SQL). In addition the peer might offer more complex services
such as annotation, mediation or mapping.
To enable the peer to participate in the
Edutella network, Edutella wrappers are used to
translate queries and results from the Edutella
query and result exchange format to the local format of the peer and vice versa, and to connect the
peer to the Edutella network by a JXTA-based
P2P library.
To handle queries the wrapper uses the common Edutella query exchange format and data
model for query and result representation. For
communication with the Edutella network the
wrapper translates the local data model into the
Edutella common data model ECDM described
in this paper and vice versa, and connects to the
Edutella Network using the JXTA P2P primitives, transmitting the queries based on the common data model ECDM in RDF/XML form (see
figure 4).
In order to handle different query capabilities,
we define several RDF-QEL-i exchange language
levels, describing which kind of queries a peer
can handle (conjunctive queries, relational algebra, transitive closure, etc.) The same internal
data model is used for all levels.
Figure 4. Query Processing in Edutella
4.2. Datalog Semantics for the Edutella
Common Data Model (ECDM)
Datalog is a non-procedural query language
based on Horn clauses without function symbols.
A Horn clause is a disjunction of literals where
there is at most one positive (non-negated) literal. A Datalog program can be expressed as a
set of rules/implications (where each rule consists
of one positive literal in the consequent of the rule
(the head), and one or more negative literals in
the antecedent of the rule (the body)), a set of
facts (single positive literals) and the actual query
literals (a rule without head, i.e. one or more negative literals). Additionally, we can use negation
as failure in the antecedent of a rule, with the semantics that such a literal cannot be proven from
the knowledge base (see, e.g., [11]).
Literals are predicates expressions describing relations between any combination of variables and constants such as title(http://www.xyz.com/book.html, ’Artificial
Intelligence’). Each rule is divided into head and
body with the head being a single literal and
the body being a conjunction of any number
of positive literals (including conditions on
variables). Disjunction is expressed as a set of
rules with identical head. A Datalog query then
is a conjunction of query literals plus a possibly
empty set of rules.
Datalog shares with relational databases and
with RDF the central feature, that data are conceptually grouped around properties (in contrast
to object oriented systems, which group information within objects usually having object identity).2 Therefore Datalog queries easily map to
2 These
views can be combined, though, see, e.g., [12]
and[13], and to some extend RDFS, which specifies classes
6
relations and relational query languages like relational algebra or SQL. In terms of relational algebra Datalog is capable of expressing selection,
union, join and projection and hence is a relationally complete query language. SQL, disregarding
aggregation and grouping, is a subset of Datalog. Additional features include transitive closure
(now included in SQL3) and other recursive definitions.
The example knowledge base in Datalog reads
title(http://www.xyz.com/ai.html,’Artificial
Intelligence’).
type(http://www.xyz.com/ai.html,Book).
title(http://www.xyz.com/sw.html,’Software
Engineering’).
type(http://www.xyz.com/sw.html,Book).
title(http://www.xyz.com/pl.html,’Prolog’).
type(http://www.xyz.com/pl.html,AI-Book).
In RDF any statement is considered to be an
assertion. Therefore we can view an RDF repository as a set of ground assertions either using
binary predicates as shown above, or as ternary
statements “s(S,P,O)”, if we include the predicate
as an additional argument. In the following examples, we use the binary surface representation,
whenever our query does not span more than one
abstraction level3 .
In (binary) Datalog notation, our example
query is
aibook(X) :- title(X, ’Artificial Intelligence’),
type(X, Book).
aibook(X) :- type(X, AI-Book).
?- aibook(X).
Since our query is a disjunction of two (purely
conjunctive) subqueries, its Datalog representation is composed of two rules with identical heads.
The literals in the rules’ bodies directly reflect
RDF statements with their subjects being the
variable X and their objects being bound to constant values such as ’Artificial Intelligence’. Literals used in the head of rules denote derived predicates (not necessarily binary ones).
In our example, the query expression “aibook(X)” asks for all bindings of X, which conform to the given Datalog rules and the knowledgebase to be queried, with the results:
in an object oriented way, even though it does not introduce object identity (though it can easily be extended with
it, see [14]).
3 see the discussion in Section 4.7.
aibook(http://www.xyz.com/ai.html)
aibook(http://www.xyz.com/pl.html)
4.3. Edutella Common Data and Query
Exchange Model
EduResultSet
hasResult
EduResult
hasResults:EduResult
EduTupleResult
hasBindings:EduVariableBinding
RDFModel
hasResultSet
hasBindings
hasHead
EduRule
hasHead:EduStatementLiteral
hasBody:EduLiteral
hasRules
EduVariableBinding
EduStatementLiteral
hasPredicate
hasPredicate:Resource
hasArguments:RDFNode hasArguments
variable:Resource
value:RDFNode
value
hasBody
EduLiteral
negated:boolean
hasQueryLiterals
EduConditionLiteral
op:Operator
arg1:RDFNode
arg2:RDFNode
arg1
arg2
Literal
EduQuery
hasRules:EduRule
hasQueryLiterals:EduLiteral
hasResultSet:EduResultSet
RDFReifiedStatement
subject:Resource
predicate:Property
object:RDFNode
variable
RDFNode
Resource
object
subject
predicate
Property
Figure 5. Edutella Common Data and Query Exchange Model (ECDM)
Internally Edutella Peers use a Datalog based
model to represent queries and their results. Figure 5 visualizes this data model as UML class diagram. All classes beginning with RDF are standard RDF concepts and reflect their usage in the
Jena RDF API [15].
Each query is represented as an instance
of EduQuery which aggregates an arbitrary
number of EduRule and EduLiteral objects.
EduLiterals are either RDFReifiedStatements
(binary predicates / ternary statement literals, corresponding to reified RDF statements),
EduStatementLiterals (non-ternary statement
literals, that cannot be expressed as ordinary
RDF statements) or EduConditionLiterals (a
condition expression on variables such as X > 5).
In our examples we use different surface notations
of this data model, and switch to a predicate as
argument view, whenever our query spans more
than one abstraction level (see Section 4.7).
Technically, it is sufficient to define a single instance of EduLiteral as query literal. However,
by using a set of EduLiteral objects, all query
7
literals together can be interpreted as the RDF result graph of the EduQuery, as long as the query
literals are all instances of RDFReifiedStatement.
An
EduRule
consists
of
an
EduStatementLiteral as its head and an
arbitrary number of EduLiterals as its body.
EduStatementLiterals can occur within a rule’s
body as well to allow reuse of other rules and
recursion.4
In database terms, EduStatementLiterals are
intensional predicates, and are defined through
the head of rules.
RDFReifiedStatements
are extensional predicates, and are stored explicitly in the RDF database.
Therefore,
RDFReifiedStatements can be expressed by binary predicates / ternary predicate statements,
while EduStatementLiterals can have more
than two arguments for the predicate.
Results are represented either as a set of
RDFModel or EduTupleResult objects depending
on whether the results are requested to be in
RDF graph or tuple format. In the latter case
each EduTupleResult aggregates a number of
EduVariableBinding objects - one for each variable within the query.
net.jxta.edutella.ecdm.io: Contains parser
and formatter classes for importing queries
given in various formats into the internal
query model or in turn export queries from
the internal model into other syntaxes and
representations.
4.4. Edutella Wrapper API
The following sketches the current prototypical Edutella Wrapper API, version 0.8, used as
a blueprint in our current Edutella wrappers, to
enable Edutella peers to handle our Edutella common data model in a coherent manner. The API
will most likely change in subsequent versions,
but its structure gives a good overview over the
functionalities this API has to provide.
The Java binding (available from the Edutella
Project Page5 ) is composed of the following packages:
4.5. RDF-QEL-i Language Levels
In the definition of the Edutella query exchange
language, several important design criteria have
been formulated:
Standard Semantics of query exchange language, as well as a sound RDF serialization. Simple and standard semantics of the query exchange
language is important, as transformations to and
from this language have to be performed within
the Edutella peer wrappers, which have to preserve the semantics of the query in the original
query language. A sound encoding of the queries
in RDF to be shipped around between Edutella
peers has to be provided.
Expressiveness of the language. We want
to interface with both simple graph based query
engines as well as SQL query engines and even
with inference engines. It is important that the
language allows expressing simple queries in a
form that simple query providers can directly use,
while allowing for advanced peers to fully use
their expressiveness.
net.jxta.edutella.ecdm: Contains all classes
for the Edutella common data model as described in Figure 5. This common model
is used for transmitting queries within the
Edutella network.
4 Note,
that as input format we can even allow arbitrary
first order logic formulas in the body of rules, which then
can be transformed into a set of rules using the LloydTopor transformation [16].
5 http://edutella.jxta.org/
net.jxta.edutella.provider: Contains a general Edutella Provider implementation
which runs an Edutella provider service.
Various Edutella providers can realize different wrappers, which correspond to different back-end repositories, and embed these
wrappers into the general implementation
as plug-ins.
net.jxta.edutella.consumer: Contains a general Edutella consumer implementation
which runs an Edutella consumer service.
Various Edutella consumers can realize different adapters to provide different presentations of query results.
net.jxta.edutella.peer/...peer.util:
Contains encapsulations and extensions of
the JXTA interfaces which are used by the
consumer and provider implementations.
8
Adaptability to different formalisms. The
query language has to be neutral to different representation semantics, it should be able to use
any predicates with predefined semantics (like
rdfs:subclassOf), but not have their semantics
built in, in order to be applicable to different formalisms used by the Edutella peers. It should
easily connect to simple RDFS repositories, relational databases and object-relation ones, as well
as to inference systems, with all their different
base semantics and capabilities.
Transformability of the query language. The
basic query exchange language model must be
easy to translate to many different query languages (both for importing and exporting), allowing easy implementation of Edutella peer wrappers.
Edutella follows a layered approach for defining the query exchange language. Currently we
have defined language levels RDF-QEL-1, -2, -3,
-4 and -5, differing in expressiveness. All language levels can be represented through the same
internal data model (see 4.3). A query representation in RDF is also specified, using reified RDF
statements to describe triple patterns. The most
simple language level (RDF-QEL-1) can also be
expressed as unreified RDF graph, which simplifies query formulation.
4.5.1. RDF-QEL Syntax
As with our internal Datalog model, the RDF
representation of a query is modeled as a set of
rules and query literals. A construct for each
ECDM query literal type is defined. To encode
RDFReifiedStatements we utilize the RDF construct called reification. Reifying an RDF statement involves creating a model of the RDF triple
in the form of an RDF resource of type Statement. This resource has as properties the subject, the predicate and the object of the modeled
RDF triple. Such reified statements are the building blocks for each query. The example query
expressed in RDF-QEL-3 resembles the internal
Datalog model described above.
<edu:QEL3Query rdf:about=’#AI_Book_Query’>
<edu:hasRule rdf:resource=’#r1’/>
<edu:hasRule rdf:resource=’#r2’/>
<edu:hasQueryLiteral rdf:resource=’#l1’/>
</edu:QEL3Query>
<edu:Variable rdf:about="#X" rdfs:label="X"/>
<edu:Rule rdf:about=’#r1’>
<edu:hasHead>
<edu:StatementLiteral>
<edu:predicate rdf:resource=’#aibook’/>
<edu:arguments>
<rdf:Seq>
<rdf:li rdf:resource=’#X’/>
</rdf:Seq>
</edu:arguments>
</edu:StatementLiteral>
</edu:hasHead>
<edu:hasBody>
<edu:RDFReifiedStatement>
<rdf:subject rdf:resource=’#X’/>
<rdf:predicate rdf:resource=’&rdf;type’/>
<rdf:object rdf:resource=’&lit;Book’/>
</edu:RDFReifiedStatement>
</edu:hasBody>
<edu:hasBody>
<edu:RDFReifiedStatement>
<rdf:subject rdf:resource=’#X’/>
<rdf:predicate rdf:resource=’&dc;title’/>
<rdf:object>
Artificial Intelligence</rdf:object>
</edu:RDFReifiedStatement>
</edu:hasBody>
</edu:Rule>
<edu:Rule rdf:about=’#r2’>
<edu:hasHead>
<edu:StatementLiteral>
<edu:predicate rdf:resource=’#aibook’/>
<edu:arguments>
<rdf:Seq>
<rdf:li rdf:resource=’#X’/>
</rdf:Seq>
</edu:arguments>
</edu:StatementLiteral>
</edu:hasHead>
<edu:hasBody>
<edu:RDFReifiedStatement>
<rdf:subject rdf:resource=’#X’/>
<rdf:predicate rdf:resource=’&rdf;type’/>
<rdf:object
rdf:resource=’&lit;AI-Book’/>
</edu:RDFReifiedStatement>
</edu:hasBody>
</edu:Rule>
<edu:StatementLiteral rdf:about=’#l1’>
<edu:predicate rdf:resource=’#aibook’/>
<edu:arguments>
<rdf:Seq>
<rdf:li rdf:resource=’#X’/>
</rdf:Seq>
</edu:arguments>
</edu:StatementLiteral>
9
4.5.2. RDF-QEL-1
RDF-QEL-1 is restricted to conjunctive formulas only. While it is possible to express them using the default RDF-QEL notation, we have designed a special RDF-QEL-1 syntax following the
QBE (Query By Example) paradigm: queries are
represented using ordinary RDF graphs having
exactly the same structure as the answer graph,
with additional annotations to denote variables
and constraints on them. Any RDF graph query
can be interpreted as a logical (conjunctive) formula that is to be proven from a knowledge base.
http://www.lit.edu/types#Book
Artificial Intelligence
dc:title
rdf:type
#Y
edu:hasVariable
rdf:type
rdf:type
#AI_Query_2
edu:Query
rdf:type
edu:Variable
rdf:type
#AI_Query_1
edu:hasVariable
#X
rdf:type
http://www.lit.edu/types#AI-Book
<edu:Variable rdf:ID="Y" rdfs:label="X">
<rdf:type
rdf:resource="&lit;Book"/>
<dc:title>Artificial Intelligence</dc:title>
</edu:Variable>
4.5.3. RDF-QEL-2
Extending RDF-QEL-1 with disjunction leads
to RDF-QEL-2. Queries of this type can be transformed into an AND-OR tree of reified statements, allowing for a very user-friendly visualization. The Conzilla query interface [17] is based
on a subset of UML, using the UML specialization relationship for logical OR and the UML aggregation relationship for logical AND. As shown
in Figure 7, our current prototype uses a graphview, which is displayed as ordinary RDF with
the exception that the triplets searched for (which
are reified in RDF-QEL-i, where n > 1) are displayed as dashed arrows indicating that they are
searched for. The logical view is displayed as
a parse tree. This is the logical combination of
the primitive statements, showing which combinations that should be matched at the same time
in order for the query to succeed. The connections between the different views are displayed
by highlighting the corresponding parts.
Figure 6. Example Query in RDF-QEL-1, Unreified Format
Since disjunction cannot be expressed in RDFQEL-1 syntax our example query has to be split
into two separate sub queries (Figure 6):
<edu:QEL1Query rdf:about="#AI_Query_1">
<edu:hasVariable rdf:resource="#X"/>
</edu:QEL1Query>
<edu:Variable rdf:about="#X" rdfs:label="X">
<rdf:type
rdf:resource="&lit;AIBook"/>
</edu:Variable>
<edu:QEL1Query rdf:about="#AI_Query_2">
<edu:hasVariable rdf:resource="#Y"/>
</edu:QEL1Query>
Figure 7. Edutella Graph Query Interface
Queries can be stored and reused later, thus
we can work with a library of queries that can
be combined to new queries. Those queries can
either be used as is or as templates, where substrings, numerical values, etc are filled in. Details
of sub-queries can be suppressed by hiding them
in detailed maps that can be presented hierarchically.
10
4.5.4. RDF-QEL-3
Going a step further, we arrive at the full Datalog semantics with conjunction, disjunction and
negation of literals. As long as queries are nonrecursive this approach is relationally complete.
4.5.5. Further RDF-QEL-i Levels
RDF-QEL-4: RDF-QEL-4 allows recursion
to express transitive closure and linear recursive
query definitions, compatible with the SQL3 capabilities. So a relational query engine with full
conformance to the SQL3 standard will be able
to support the RDF-QEL-4 query level.
RDF-QEL-5: Further levels allow arbitrary
recursive definitions in stratified or dynamically
stratified Datalog, guaranteeing one single minimal model and thus unambiguous query results
([18]).6
RDF-QEL-i-A: Support for the usual aggregation functions as defined by SQL92 (e.g.
COUNT, AVG, MIN, MAX) will be denoted
by appending “-A” to the query language level,
i.e. RDF-QEL-1-A, RDF-QEL-2-A, etc. RDFQEL-i-A includes these aggregation functions as
edu:count, edu:avg, edu:min, etc. Additional
“foreign” functions like edu:substring etc. to be
used in conditions might be useful as well, but
have not been included yet in RDF-QEL-i-A.
RDF-MEL RDF-MEL is an extension RDFQEL-3 by constructs to modify knowledge bases
on other peers. It provides commands similar to
the SQL INSERT, DELETE and UPDATE statements. See [19] for a detailed description.
4.6. Representing Complex Property Semantics
RDFS already comes with predefined semantics for certain properties (i.e. transitiveness of
rdfs:subclassof, inheritance for rdf:type). Whenever the query includes these pre-defined predicates, we presume these to have the pre-defined
semantics. This is valid for DAML+OIL predefined predicates and their semantics as well, i.e.
if we use definitions like
6 Technically,
when using negation, recursion and the
ternary representation of statements, static stratification
can never be guaranteed (because we only use one ternary
predicate “s(S,P,O)”), so we have to rely on dynamic stratification (which depends on the actual instantiation of literals) or switch to well-founded semantics.
<daml:TransitiveProperty rdf:ID=’hasAncestor/>
then transitivity of hasAncestor is assumed
hasAncestor(X,Y) :- hasAncestor(X,Z),
hasAncestor(Z,Y).
without having to be specified explicitly in the
query.
If we want to specify something else,
we have in principle to specify its semantics as Datalog rule, and ship it with the
query.
However, we can add special annotations like edu:transitive closure of
(denoting transitive closure of properties),
(inheritance
of
edu:inherited version of
properties along the subclassOf-hierarchy),
edu:reflexive version of (reflexive version
of a property) to properties, which can be used
directly by the Edutella peer wrapper (whenever
it knows what these edu:properties mean).
This has the advantage, that the wrapper does
not have to infer the correct semantics from the
corresponding Datalog rules, but can use the
predefined semantics for these edu:properties
directly. This keeps the clear semantics for
RDF-QEL-i, but allow abbreviations which
make it easier to write Edutella peer wrappers.
Also, while it is possible to axiomatize quite a
lot of specific operators in Datalog (including
the ones discussed above), Datalog also has
its limitations. Datalog (and its extensions)
do overlap with description logic fragments of
first order logic (e.g. DAML+OIL), but usually
cannot axiomatize them completely (in the other
direction, this observation is true as well).
4.7. Querying Schema Information
As apparent already from the RDFS schema
definition [6], and discussed in more detail in the
recent RDF model theory [20] (see also the axiomatic definition of an extension of RDFS we
called O-Telos-RDF [14]), RDFS does not distinguish between data and schema level, and represents all information in a uniform way as a graph.
Indeed, as discussed in [14] in some more details,
there is no principle difference between entities at
different modeling levels (i.e. objects, classes and
meta-classes are represented in a uniform way),
and queries over an RDFS schema should not be
more difficult than queries over RDFS data.
11
Therefore our internal query exchange model as
shown in Figure 5 treats entities on all levels in a
uniform way (as RDFNodes), and the attributes
of EduStatementLiterals can be entities on different levels (objects, classes or even predicates).
Therefore, representing queries at different levels
does not pose problems.
In order to express Datalog like queries ranging
over different abstraction levels, instead of writing properties as binary predicates, we have to
switch to a triple syntax using a ternary predicate
“s”, i.e. instead of writing “book(X,’Artificial Intelligence’)” we write “s(X,book,’Artificial Intelligence’)”. If we enforce the restriction, that the
predicate symbol “s” always denotes this special
ternary predicate, we can also mix this notation
with the binary predicate notation we used so far
in our examples. Generalizing the query from our
running example a bit, we now want to ask for
any additional property our AI books might have,
getting the query:
aibook(X) :- title(X, ’Artificial Intelligence’),
type(X, Book).
aibook(X) :- type(X, AI-Book).
book_property(P) :- s(P, rdfs:domain, Book).
ai_book_property(P) :- s(P, rdfs:domain, AI-Book).
ai_book_attribute(X,P,V) :aibook(X), book_property(P), s(X,P,V).
ai_book_attribute(X,P,V) :aibook(X), ai_book_property(P), s(X,P,V).
?- ai_book_attribute(X,P,V)
4.8. Result Formats
4.8.1. Standard Result Set Syntax
As a default, we represent query results as a set
of tuples of variables with their bindings. Referring to our example there are two bindings for a
single variable:
<edu:ResultSet rdf:about=’#AI_Results’>
<edu:hasResult>
<edu:TupleResult>
<edu:hasBinding>
<edu:VariableBinding>
<edu:bindsVariable rdf:resource=’#X’/>
<rdf:value
rdf:resource=’http://www.xyz.com/ai.html’/>
</edu:VariableBinding>
</edu:hasBinding>
</edu:TupleResult>
</edu:hasResult>
<edu:hasResult>
<edu:TupleResult>
<edu:hasBinding>
<edu:VariableBinding>
<edu:bindsVariable rdf:resource=’#X’/>
<rdf:value
rdf:resource=’http://www.xyz.com/pl.html’/>
</edu:VariableBinding>
</edu:hasBinding>
</edu:TupleResult>
</edu:hasResult>
</edu:ResultSet>
This is also shown in Figure 5, and closely follows the convention of returning substitutions for
variables occurring in queries to logic programs.
4.8.2. RDF Graph Answers
Another possibility, which has been explored
recently in Web related languages focusing on
querying semistructured data (for an overview
see, e.g., [21]), is the ability to create objects as
query results. In the simple case of RDF-QEL-1,
we can return as answer objects the graph representing the RDF-QEL-1 query itself with all
Edutella specific statements removed and all variables instantiated. The results can be interpreted
as the relevant sub graph of the RDF graph we
are running our queries against (see Figure 3). In
other words, the answer graph contains sufficient
information, so that running the query using only
the data in the answer graph returns the same
result as running the query against the original
database.
<lib:Book about="http://www.xyz.com/ai.html">
<dc:title>Artificial Intelligence</dc:title>
</lib:Book>
<lib:AI-Book about="http://www.xyz.com/pl.html"/>
When we use general RDF-QEL-i queries, we
assume the structure of the answer graph to be
defined by the query literals (provided they are all
binary predicates). Note, that all variables used
in the query literals are assumed to be existentially quantified, so if they are not instantiated
during the query evaluation, they are represented
as anonymous nodes in the RDF answer graph
(as discussed in [20]).7
7 Anonymous
nodes, i.e. existential variables in the RDF
12
An additional interesting extension is to allow
skolem functions in the head literals of our rules,
which allows us to generate arbitrary complex objects using these skolem values as object IDs (see
also [21] or the original F-Logic [12] proposal).
The current Edutella version does not support
this functionality, but future versions will include
support based on the TRIPLE semantics [22,23].
5. Wrapping Different Peer Query Languages
The following sections are not meant to be
complete characterizations of the mappings but
rather sketch these mappings and translate our
example query into the local query language. Further details will be found in forthcoming reports.
5.1. RQL
RQL is an RDF query language described in
[24] and [25], and used within the EU-IST project
On-To-Knowledge. RQL focuses on SQL like
query expressions, exploiting path expressions,
implicit and explicit joins, and the usual comparison operators. All examples in [26] and [25] can
be expressed using conjunctive queries, though
the formal RQL specification also includes all set
operations (union, intersect and minus), making
it relationally complete.
The default for queries including typeOf and
subclassOf is to use transitivity of subclassOf,
as defined by RDFS. These queries would
be translated using simple typeof(X,Y) and
subclassof(X,Y) binary predicates, as the transitivity of subclassof is reflected in the query
engine, not in the query language.
Additionally, RQL specifies the variants
typeOfˆ and subclassOfˆ (“direct” typeOf and
subclassOf), which can be defined in Datalog as
follows (assuming subclassOf not to be reflexive, as advocated in [6], though reflexivity could
easily be included in the axiomatization):
typeof^(X,Y) :- typeof(X,Y),
not(typeof(X,Z),
subclassof(Z,Y)).
graph itself, can be handled by the usual Lloyd-Topor
transformation [16].
subclassof^(X,Y) :- subclassof(X,Y),
not(subclassof(X,Z),
subclass(Z,Y)).
or the other way around, if we assume that the
local peer stores only the “typeofˆ ” and “subclassofˆ ” facts
subclassof(X,Y) :- subclassof^(X,Y).
subclassof(X,Y) :- subclassof(X,Z),
subclassof^(Z,Y).
typeof(X,Y) :- typeof^(X,Y).
typeof(X,Y) :- subclassof(Z,Y),
typeof^(X,Z).
Using definitions like these for “subclassof” presumes recursive query capability in the
provider who has to evaluate these definitions, or
at least a transitive closure operator (which is for
example provided in SQL3).
Example query in RQL:
select X
from Book{X}.title{Y}
where Y = "Artificial Intelligence"
UNION
select X
from AI-Book{X}
Structurally and syntactically the query looks
similar to its SQL counterpart. RQL uses its
own syntax and does not come with any RDF
XML serialization. The RDF type statements
do not need to be made explicit since the RDFS
class concept is an inherent part of RQL. Both
sub queries use linear path expressions. In case
the queries are based on more complex graph
structures several linear path expressions are enumerated in the FROM clause and have to be
joined explicitly by a WHERE clause. For querying schema information, which is also possible in
RQL, we translate the RQL query expressions
into Edutella Datalog programs using the ternary
notation of triples, as discussed in Section 4.7.
5.2. TRIPLE
TRIPLE is an RDF query, inference, and transformation language [22] based on Horn logic and
F-Logic [12]. TRIPLE’s architecture allows semantic features to be defined for various objectoriented and other RDF extensions like RDF
13
Schema. TRIPLE provides a (human readable)
Prolog-like syntax as well as an RDF-based syntax for exchanging queries and rules. In the
Prolog-like syntax, RDF statements are written
as molecules, i.e.,
subject[predicate−> object]
or, for multiple predicate-object pairs for one subject,
subject[pred1 −> obj1 ; pred2 −> obj2 ; ...]
Our sample knowledge base and query can be
mapped as follows:
// namespace abbreviations
rdf :=
’http://www.w3.org/1999/02/22-rdf-syntax-ns#’.
dc := ’http://purl.org/dc/elements/1.0/’.
types := ’http://www.lit.edu/types#’.
xyz := ’http://www.xyz.com/’.
// sample knowledge base
xyz:’sw.html’[ rdf:type -> types:Book;
dc:title -> ’Software Engineering’ ].
xyz:’ai.html’[ rdf:type -> types:Book;
dc:title -> ’Artificial Intelligence’ ].
xyz:’pl.html’[ rdf:type -> types:’AI-Book’;
dc:title -> ’Prolog’ ].
// sample query
FORALL X aibook(X) <X[rdf:type -> types:’AI-Book’]
OR X[rdf:type -> types:Book;
dc:title -> ’Artificial Intelligence’].
TRIPLE regards RDF data not as one large
heap, but partitions the set of RDF data in different subsets, called RDF models. Different subsets could be coming from different sources, have
different semantics etc. TRIPLE supports models, parameterized models, model expressions,
etc., which are useful extensions to RDF-QEL-i
as well, when we want to concentrate more on
data integration and transformation using different sources.
In general, at least RDF-QEL-3 or higher is
needed to capture TRIPLE programs, which are
then very close to our internal data model (both
being based on Horn logic). The current Edutella
model supports basic Horn-TRIPLE, currently
unsupported features are e.g., functional terms
and RDF models as sets of statements. For a
longer elaboration on the complete TRIPLE semantics see [23], which will also serve as a guide to
further extensions of the Edutella Common Data
Model.
5.3. SQL
To keep the discussion simple, we assume a
single STATEMENTS table (i.e., we use the
triple-oriented view and ternary statements) storing all statements the repository is aware of in
a relational database.8 The table consists of
three columns SUBJECT, PREDICATE and OBJECT of type character string with each column
value being interpreted either as concatenation of
namespace and resource name or as literal value.
More sophisticated database schemas might provide a view according to this one-table schema.
The mapping of the example query is straightforward. In terms of its Datalog representation
the query may be satisfied by either of its two
rules with the first being a conjunction of two literals and the second only involving a single literal.
The disjunction of the two rules is mapped to a
UNION of two sub queries. This query structure
directly resembles the RDF-QEL-3 syntax as well
as the internal Datalog query model.
SELECT S1.SUBJECT
FROM STATEMENT S1, STATEMENT S2
WHERE S1.PREDICATE =
’http://purl.org/dc/elements/1.1/#title’
AND S1.OBJECT = ’Artificial Intelligence’
AND S2.PREDICATE =
’http://www.w3.org/1999/02/22-rdf-syntax-ns#type’
AND S2.OBJECT = ’http://www.lit.edu/types#Book’
AND S1.SUBJECT = S2.SUBJECT)
UNION
SELECT S1.SUBJECT
FROM STATEMENT S1
WHERE S1.PREDICATE =
’http://www.w3.org/1999/02/22-rdf-syntax-ns#type’
AND S1.OBJECT = ’http://www.lit.edu/types#AIBook’
More complicated queries with rules referencing other rules in their bodies need to be modeled
either as nested queries or as derived relations
[11]. It is not possible to map queries containing
recursive rules to traditional SQL (that is SQL92
or below), therefore SQL92 only maps queries up
to RDF-QEL-3. With SQL3 supporting linear
recursion we are also able to map RDF-QEL-4
queries. For those more familiar with relational
8 Some
RDF stores include additional information like
models, etc., which we neglect in this section.
14
algebra expressions, we also provide the query in
relational algebra. In contrast to the statement
centric view used in the SQL mapping, we use a
predicate centric approach here, with one relation
(or table) for each predicate.
ΠTYPE.subject
((σobject=Artificial Intelligence (TITLE)
✶ σobject=Book (TYPE))
∪ (σobject=AIBook (TYPE)))
5.4. XPath/Apache Xindice
The open source native XML database Apache
Xindice [27] provides a natural way to store
XML-based learning resource metadata. In the
following we show an example knowledge base
stored in the metadata repository (using DCMI
RDF/XML binding[28]):
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf=
"http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
within XML documents. So conjunctive queries
in RDF-QEL can be mapped into conjunctions
of XPath expressions. In case the XML-database
also supports disjunction between XPath expressions, we can also translate RDF-QEL queries
into appropriate XPath queries using both AND
and OR Boolean connectives. The translation
between XML- and RDF-schemas such as IMS
or LOM for learning resources is especially easy,
because these schemas basically define hierarchically structured metadata, where the differences
between RDF and XML hardly matter. Our
example query can be written in the syntax of
XPath as:
//*[@rdf:about
and (dc:title
[@rdf:resource="Artificial Intelligence"]
or dc:title
[text()="Artificial Intelligence"])
and (dc:type [@rdf:resource="Book"]
or dc:type [text()="Book"])
]
|
//*[@rdf:about and
(dc:type [@rdf:resource="AIBook"]
or dc:type [text()="AIBook"])
]
<rdf:Description
rdf:about="http://www.xyz.org/ai.html">
<dc:title>Artificial Intelligence</dc:title>
<dc:type>Book</dc:type>
</rdf:Description>
and the results will be
<rdf:Description
rdf:about="http://www.xyz.org/sw.html">
<dc:title>Software Engineering</dc:title>
<dc:type>Book</dc:type>
</rdf:Description>
<rdf:Description
rdf:about="http://www.xyz.org/ai.html">
<dc:title>Artificial Intelligence</dc:title>
<dc:type>Book</dc:type>
</rdf:Description>
<rdf:Description
rdf:about="http://www.xyz.org/pl.html">
<dc:title>Prolog</dc:title>
<dc:type>AIBook</dc:type>
</rdf:Description>
</rdf:RDF>
<rdf:Description
rdf:about="http://www.xyz.org/pl.html">
<dc:title>Prolog</dc:title>
<dc:type>AIBook</dc:type>
</rdf:Description>
Because Apache Xindice employs the W3C
XPath [29] to accomplish its query service, the
first task for content providers is to map RDFQEL to a query representation in XPath. XPath
provides several XML-specific features (like retrieving hierarchical substructures from a whole
XML document). If we abstract from these features and focus on the features comparable to a
relational query language, XPath basically provides select statements identifying specific tags
Note, that the default behavior for XPath result sets in Apache Xindice is different from the
result sets defined for our RDF-QEL-i queries.
RDF-QEL-i queries return exactly the tuples
which we mention in the query, while XPath
queries in Apache Xindice by default return the
whole sub-document located below the element
selected by the XPath expression, but might also
identify the whole document matching this expression. In general, this wrapper does not aim to
map XPath and RDF-QEL completely, but rather
15
handles a common subset of both languages during the translation process.
5.5. AmosQL and Mediators
Amos II [30] is a distributed mediator engine where views of data from several different
data sources can be defined and queried. The
views are defined using the functional and objectoriented query language AmosQL. An Amos II
based wrapper for RDF and RDFS has been implemented. It allows general queries accessing
RDF and RDFS meta-data descriptions.
RDF-QEL-i queries are translated to AmosQL
statements, as will be shown. However, it is more
challenging to represent the mediators themselves
in RDF or RDF-Schema, the reasons being that
this requires rich data model, e.g. many data
types and mediation primitives, which are needed
for mediation from many different kinds of data
sources. Amos II provides such mediation primitives [31,32].
An Amos II object is classified into one or
more types making the object an instance of those
types. The set of all instances of a type is called
the extent of the type. The types are organized in
a supertype/subtype hierarchy. If an object is an
instance of a type, then it is also an instance of all
the supertypes of that type; conversely, the extent
of a type is a subset of the extent of a supertype
of that type, extent-subset semantics. Types in
Amos II correspond to classes in RDF-Schema.
RDFS also uses extent-subset semantics.
Object attributes, queries, methods, and relationships are modeled by functions in Amos
II. Depending on their implementation the functions can be classified into several kinds including
stored functions that represent facts and correspond to properties in RDFS, derived functions
that represent views and correspond to rules in
Datalog, and foreign functions that implement
algorithms external to the query language (e.g in
Java). When wrapping external data sources with
Amos II the multi-directional foreign function facility [30] provides the primitives to specify access
paths and capabilities of the sources.
The general syntax for AmosQL queries is:
select <result>
from <domain specifications>
where <condition>
For example,
select distinct X
from Book X, AI-Book Y
where title(X) = ’Artificial Intelligence’ or
X = Y;
Each domain specification associates a query
variable with a type where the variable is universally quantified over the extent of the type,
including indefinite extents as integers with some
restrictions. This is different to SQL where variables range only over tuples in tables.
Since the semantic data model of Amos II,
based on types and functions, has many similarities with RDFS it is straight-forward to map
RDFS metadata descriptions to Amos II schema
definitions. The basic RDF model is essentially
a binary relational model which, since AmosQL
is relationally complete, easily can be stored and
queried using AmosQL. An RDF parser first
translates RDF statements to corresponding binary relationships (triples) in Amos II. However,
RDFS requires further processing to semantically
enrich the basic RDF representation to include
functions (properties), types (classes), and inheritance. Therefore, after an RDF meta-data
document is loaded, the system goes through
the loaded binary RDF relationships to find the
RDFS type, inheritance, and property definitions
from which the corresponding meta-definitions in
Amos II are automatically generated as a set of
type and function definitions. These definitions
are defined in terms of the basic RDF binary relations as views. In this way we can maintain both
the basic binary RDF representation of meta-data
along side with semantic views that access it, thus
making it possible to query the data using different models with different semantic expressiveness.
Meta-objects (schema elements) in Amos II
mediators, such as types and function, are first
class and can be queried as any other objects,
as in RDFS. The transparent representation of
meta-objects in the mediators allows powerful
queries about the capabilities and structure of
each mediator.
A query compiler translates AmosQL statements into object calculus and algebra expressions in an internal simple logic based language
16
called ObjectLog [33], which is an object-oriented
dialect of Datalog where predicates are typed and
can be overloaded.
Since AmosQL is relationally complete, and
RDF statements are represented as binary relationships, it is easy to translate RDF-QEL-3 into
AmosQL. Queries in RDF-QEL-3 are specified as
binary RDF relationships stored in the database.
An AmosQL query string is then generated by a
declarative query that constructs from an RDFQEL-3 query specification as RDF triples the corresponding AmosQL query string. This query
string is then sent to the Amos II query engine for
evaluation. Notice that RDF-QEL-3 queries are
mapped to AmosQL queries over the triple space.
Semantically richer queries using RDFS can also
be processed by querying the semantic views corresponding to RDFS definitions rather than RDF
triples.
AmosQL does not permit recursive views; instead a transitive closure meta-function is provided to handle most situations requiring recursive views.
6. Registration Service and Query Mediators
The wrapper-mediator approach introduced in
[34], divides the functionality of a data integration system into two kinds of subsystems. The
wrappers provide access to the data in the data
sources using a common data model (CDM) and a
common query language. The mediators provide
coherent views of the data in the data sources by
performing semantic reconciliation of the CDM
data representations provided by the wrappers.
Both common data model (ECDM) and common
query language for the Edutella network have
been defined in this paper.
To mediate distributed data sources we are using a two-layered approach: Simple ’wrapping’
mediators distribute queries to the appropriate
peer with the restriction that queries can be answered completely by one Edutella peer. Complex ’integrating’ mediators are able to mediate distributed queries over multiple repositories.
The query syntax to queries to both kinds of mediator will be identical in both cases.
6.1. Simple Wrapping Mediators
The first layer of functionality for distributed
queries in the Edutella network will be based
on simple query hubs and wrapping mediation.
While query hubs might have some wrapping capability, our prototype peers will use them only
as registration and query distribution peers using the Edutella common data and query model,
and implement wrapping capability (to and from
the common model) locally within the Edutella
peer wrappers as discussed in Section 4.4. Thus,
each Edutella peer offers a common query interface based on the common model (possibly at different levels as defined by RDF-QEL-i) to the network.
Registration of peer query capabilities is
based on (instantiated) property statements and
schema information, telling the network, which
kind of schema the peer uses, with some possible
value constraints (select conditions). These registration messages have the same syntax as RDFQEL-1 queries, which are sent from the peer to
the registration / query distribution hub. Additionally, the peer announces to the hub, which
query level it can handle (RDF-QEL-1, RDFQEL-2, etc.) Whenever the hub receives queries,
it uses these registrations to forward queries to
the appropriate peers, merges the results, and
sends them back as one result set.
6.2. Mediator Peers handle Distributed
Queries
The second layer introduces query mediators or
query hubs. These mediators bring in the extra
intelligence required to assemble distributed and
heterogeneous queries. These more complex mediators submit subqueries to different repositories
that might be able to answer them, collect the
sub-results, join and reconcile them, and again
return the outcome to the client.
Several mediator servers will be available communicating through JXTA. Each mediator peer
has its own mediator meta-data schema and accesses meta-data from other mediators or data
sources. The views provided through the integrating mediators are transparently queryable using RDF-QEL-i.
In Amos II each mediator peer appears as
17
a virtual database layer having object-oriented
data abstractions and query language. Objectoriented views provide transparent access to the
data sources from clients and other mediator
peers. Conflicts and overlaps between similar
real-world entities being modeled differently in
different data sources are reconciled through the
mediation primitives [31,32] of Amos II which
are translated to ObjectLog. The mediation services allow transparent access to similar object
structures represented differently in different data
sources.
The representation of integrating mediators in
RDF requires a richer data model than what
is currently available in RDF or RDF-Schema.
Alternatively various conventions can be introduced in the RDF-based meta-data definitions,
e.g. some convention is needed on how to represent type annotated and generic Datalog rules,
since ObjectLog rules can be overloaded on types.
A somewhat inelegant way would be to use different name spaces for this but type annotated
properties seem more convenient. Second, RDF
currently does not have views and can therefore
not represent mediators that join data from different sources. Named RDF-QEL-i queries would
provide a way to specify views. Derived Amos
II functions would correspond to derived properties defined as named RDF-QEL-i queries. Third,
the mediation primitives for reconciling overlapping and conflicting information in data sources
need RDF bindings.
Mediators can cooperate by being defined in
terms of other mediators, i.e. the mediators are
composable [35,30,36]. The composition of mediators allows for modularity and reuse of the view
definitions while avoiding the administrative and
performance bottleneck of having a single mediator system with a global schema. Different interconnecting topologies can be used to compose
mediator servers depending on the integration requirements. Queries to mediator peers are decomposed into optimized distributed query plans
[37,36].
7. Prototype and Application Scenario
Our current prototype setup features a set
of (already existing) peers, which we have extended with the appropriate Edutella wrappers,
and which connect to the Edutella framework
with the following functionalities: local query (directly to repository), distributed query (mediated
by a simple wrapper mediator and by an AMOS
II mediator peer) and update (through annotation
peer).
The following peers can be connected to the
Edutella network using the Edutella wrapper libraries: OLR Repository peer [38], based on
subset of IMS/LOM metadata, will be able to
translate from RDF-QEL-3 into internal query
language SQL, return results in specified result format, DbXML peer [39], as a prototype
for an XML-DB, based on subset of subset of
IMS/LOM metadata, using a simple mapping
service to translate from RDF-QEL-1 queries to
Xpath queries over the appropriate XML-LOM
schema, AMOS II peer (with local repository)
[30], translate from RDF-QEL-3 into AmosQL,
Simple query and registration hub, distribute
queries based on schema information and query
capabilities, Complex mediation peer, mediate
queries on AMOS II based mediation, uses one
AMOS II peer and one OLR repository peer,
Graphical query interface peer based on Conzilla
[17], take a graph, and translate it to a query
expression, which then can be pushed into the
Edutella network, visualize results, with RDFQEL-1 and RDF-QEL-2 functionality JXTA shell
peer as well as textual interface implemented via
servlet, for direct query input in RDF-QEL-3,
Annotation peer based on Ontomat [40], query
a repository with a query, update/annotate the
results, write them back to the repository,KAONServer9 , formerly OntoBroker, see also [41], OAI
peer which acts as a bridge to integrate repositories providing an Open Archive Initiative interface into the Edutella network [42], Storage
and computation peer with Datalog capabilities
for RDFS and O-Telos-RDF ([43], based on ConceptBase Server [13]), and a File based repository peer based on the JENA toolkit, with the
9 http://kaon.aifb.uni-karlsruhe.de/
18
corresponding query language RDQL [15], which
stores its RDF data in files. Source code of the
Edutella implementation can be downloaded from
the Edutella Project Page.
Smart Learning Space
Personal
Learning
Assistant
Personal
Learning
Assistant
Edutella Peer-to-Peer
Infrastructure
Learning
Management
Network
Resource Discovery
and Annoucements
Educational Service Rating/Evaluation
Provider
Service Provider
Educational resources
Booking and
Access Control
via Web Service
Educational Service
Provider
Edutella interface
Metadata describing educational services
Learning Passport
Edutella query hub
Web Service interface
Figure 8. Elena Smart Learning Space
The EU project Elena is using Edutella as basic
infrastructure to create a smart learning space, a
network of learning services from already existing
service providers [44]. All offers within this learning space are described in RDF at the different educational service providers. The content is not restricted to description of on-line learning objects.
For example, booking and rating information will
also be provided. Service providers use one of the
available wrappers to integrate their offers into
the network. Learners access this information via
their personal learning assistant (PLA) which is
also connected to the network (see 8). PLAs find
suitable services according to learners request, using the Edutella query service. They take advantage of a personal profile in order to augment the
learners queries and personalize query results.
8. Summary and Acknowledgements
While in the server/client-based environment
of the World Wide Web metadata are useful and
important, for P2P environments metadata are
absolutely crucial, in order to describe the resources managed by these peers. So far, P2P applications have used domain specific formats and
metadata schemas, leading to a fragmentation of
the P2P worlds into niche markets.
In this paper, we have described the current status of the Edutella project, which addresses these shortcomings of current P2P applications by building on the W3C metadata standard RDF. The project is a multi-staged effort to
scope, specify, architect and implement an RDFbased metadata infrastructure for P2P-networks
based on the recently announced JXTA framework. Edutella is the first system which brings together RDF and P2P concepts and exploits their
strengths in a common framework, suitable for
building general schema-based P2P networks for
distributed and dynamic information providers.
We have described the main architecture and
services provided by the Edutella framework, and
have discussed in detail the Edutella query service, which defines a common query exchange
model and language used to exchange queries and
query results between Edutella peers. We have
further discussed the basic registration and mediation services for distributed queries
Our vision is to provide the metadata services
needed to enable interoperability between heterogeneous JXTA applications. We have already implemented several wrappers for integration of various metadata repositories into the Edutella network, and we have described the prototype environment we are using to test the Edutella framework. Our first application will focus on a network of educational service providers and matching user applications. Further infrastructure work
will concentrate on refining the existing architecture and scalability of the Edutella network and
add further kinds of peers and services to the network.
Acknowledgements. This paper is based
on a lot of fruitful discussions with participants
within the PADLR projects. We especially want
to thank Steffen Staab and Raphael Volz from
AIFB, Martin Wolpers and Hadhami Dhraief
from KBS, and Gustav Neumann and Bernd Simon from Vienna.
19
REFERENCES
1. IEEE Learning Technology Standards Committee, IEEE LOM Working Draft 6.1,
http://ltsc.ieee.org/wg12/index.html (Apr.
2001).
2. IMS Global Learning Consortium Inc.,
IMS Learning Resource Metadata Specification v1.2.2, http://www.imsproject.org/
metadata/index.html.
3. R. Dornfest, D. Brickley, The power of
metadata, http://www.openp2p.com/pub/a/
p2p/2001/01/18/metadata.html, excerpted
from the book ”Peer-to-Peer: Harnessing the
Power of Disruptive Technologies (Jan. 2001).
4. The
Edutella
Project,
http://edutella.jxta.org/.
5. O. Lassila, R. R. Swick, W3C Resource
Description Framework (RDF) Model and
Syntax Specification, http://www.w3.org/
TR/REC-rdf-syntax/, W3C Recommendation (Feb. 1999).
6. D. Brickley, R. V. Guha, W3C RDF Vocabulary Description Language 1.0: RDF
Schema, http://www.w3.org/TR/1998/WDrdf-schema/, W3C Working Draft (Nov.
2002).
7. L. Gong, Project JXTA: A technology
overview, Tech. rep., SUN Microsystems,
http://www.jxta.org/project/www/docs/
TechOverview.pdf (Apr. 2001).
8. ADL Technical Team, SCORM Specification
v1.2, http://www.adlnet.org (Oct. 2001).
9. Project
JXTA
Homepage,
http://www.jxta.org/.
10. SUN Microsystems, JXTA v1.0 Protocols Specification, http://spec.jxta.org/v1.0/
docbook/JXTAProtocols.html (2001).
11. A. Silberschatz, H. F. Korth, S. Sudarshan,
Database Systems Concepts, 4th Edition,
McGraw-Hill Higher Education, 2001.
12. M. Kifer, G. Lausen, J. Wu, Logical foundations of object-oriented and frame-based languages, Journal of the ACM 42 (4) (1995)
741–843.
13. M. Jarke, R. Gallersdörfer, M. Jeusfeld,
M. Staudt, S. Eherer, ConceptBase - a deductive object base for meta data manage-
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
ment, Journal on Intelligent Information Systems 4 (2) (1995) 167 – 192.
W. Nejdl, H. Dhraief, M. Wolpers, O-telosrdf: A resource description format with enhanced meta-modeling functionalities based
on o-telos, in: Workshop on Knowledge
Markup and Semantic Annotation at the
First International Conference on Knowledge Capture (K-CAP’2001), Victoria, BC,
Canada, 2001.
B. McBride, Jena:
Implementing the
rdf model and syntax specification, Tech.
rep., Hewlett Packard Laboratories, Bristol, UK, http://www.hpl.hp.com/semweb/
index.html (2000).
J. W. Lloyd, R. W. Topor, Making prolog
more expressive, Journal of Logic Programming 3 (1984) 225–240.
M. Nilsson, M. Palmér, Conzilla - towards
a concept browser, Tech. Rep. CID-53,
TRITA-NA-D9911, Department of Numerical Analysis and Computing Science, KTH,
Stockholm, http://kmr.nada.kth.se/papers/
ConceptualBrowsing/cid 53.pdf (1999).
T. Przymusinski, Every logic program has
a natural stratification and an iterated least
fixed point model, in: ACM Symposium on
Principle of Database Systems (PODS), 1989,
pp. 11–21.
W. Nejdl, W. Siberski, B. Simon, J. Tane,
Towards a modification exchange language
for distributed rdf repositories, in: 1st
International Semantic Web Conference
(ISWC2002), Sardinia, Italy, 2002.
P. Hayes, RDF semantics, Tech. rep., W3C
Working Draft (Nov. 2002).
S. Abiteboul, P. Buneman, D. Suciu, Data
on the Web, Morgan Kaufmann Publishers,
2000.
M. Sintek, S. Decker, TRIPLE—A query, inference, and transformation language for the
Semantic Web, in: 1st International Semantic
Web Conference (ISWC2002), Sardinia, Italy,
2002.
S. Decker, M. Sintek, W. Nejdl, The modeltheoretic semantics of TRIPLE, submitted for
Publication (Nov. 2002).
G.
Karvounarakis,
S.
Alexaki,
20
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
V. Christophides, D. Plexousakis, M. Scholl,
Rql: A declarative query language for
rdf, in:
11th International Conference on the WWW, Honolulu, Hawaii,
USA, 2002,
http://www.ics.forth.gr/isl/
publications/paperlink/dql-rdf.pdf.
J. Broekstra,
Sesame RQL: a Tutorial,
Aidministrator
Nederland,
http://sesame.aidministrator.nl/
publications/rql-tutorial.html (May 2002).
G. Karvounarakis, V. Christophides, D. Plexousakis, S. Alexaki, Querying community
web portals, http://www.ics.forth.gr/proj/
isst/RDF/RQL/, 2001. Submitted for publication. (2001).
Apache Software Foundation,
Apache
Xindice, http://xml.apache.org/xindice/.
S. Kokkelink, R. Schwänzl, Expressing
Qualified Dublin Core in RDF/XML,
DCMI,
http://dublincore.org/documents/
dcq-rdf-xml/ (Apr. 2002).
J. Clark, S. deRose, XML Path Language
(XPath), version 1.0, Tech. rep., W3C, W3C
Recommendation (Nov. 1999).
T. Risch, V. Josifovski, Distributed data integration by object-oriented mediator servers.,
Concurrency and Computation: Practice and
Experience 13 (11) (2001) 933 – 953.
V. Josifovski, T. Risch, Functional query optimization over object-oriented views for data
integration., Journal of Intelligent Information Systems (JIIS) 12 (2-3) (1999) 165 – 190.
V. Josifovski, T. Risch, Integrating heterogeneous overlapping databases through
object-oriented
transformations.,
in:
25th Conf. on Very Large Databases
(VLDB’99), Edinburgh, Scotland, 1999, pp.
435 – 446, http://www.dis.uu.se/˜udbl/
publ/vldb99.pdf.
W. Litwin, T. Risch, Main memory oriented
optimization of oo queries using typed datalog with foreign predicates., IEEE Transactions on Knowledge and Data Engineering
4 (6) (1992) 517 – 528.
G. Wiederhold, Mediators in the architecture
of future information systems., IEEE Computer 25 (3) (1992) 38 – 49.
V. Josifovski, T. Katchaounov, T. Risch,
36.
37.
38.
39.
40.
41.
42.
43.
Optimizing queries in distributed and
composable mediators, in:
4th Conference on Cooperative Information Systems,
CoopIS’99, Edinburgh, Scotland, 1999, pp.
435 – 446, http://www.dis.uu.se/˜udbl/
publ/coopis99.pdf.
V. J. T. Katchaounov, T. Risch, Scalable
view expansion in a peer mediator system., in: 8th International Conference on
Database Systems for Advanced Applications (DASFAA 2003), Kyoto, Japan, 2003,
http://user.it.uu.se/˜torer/publ/ovdl.pdf.
V. Josifovski, T. Risch, Query decomposition
for a distributed object-oriented mediator
system., Distributed and Parallel Databases
11 (3) (2002) 307 – 336.
H. Dhraief, W. Nejdl, B. Wolf, M. Wolpers,
Open learning repositories and metadata
modeling, in: International Semantic Web
Working Symposium (SWWS), Stanford,
CA, 2001.
C. Qu, W. Nejdl, Towards interoperability and reusability of learning resources:
A SCORM-conformant courseware for computer science education, in: Proc. of the 2nd
IEEE International Conference on Advanced
Learning Technologies (IEEE ICALT 2002),
Kazan, Tatarstan, Russia, 2002.
S. Handschuh, S. Staab, Authoring and annotation of web pages in cream, in: 11th International Conference on the WWW, Honolulu,
Hawaii, USA, 2002.
A. Mädche, S. Staab, R. Studer, Y. Sure,
R. Volz, Seal - tying up information integration and web site management by
ontologies, IEEE Data Engineering Bulletin http://www.aifb.uni-karlsruhe.de/˜sst/
Research/Publications/data-engineeringbulletin2002.pdf.
B. Ahlborn, W. Nejdl, W. Siberski, B. Simon, J. Tane, Oai-p2p: A peer-to-peer network for open archives, in: Workshop on Distributed Computing Architectures for Digital Libraries, 31st Intl. Conference on Parallel
Processing, Vancouver, Canada, 2002.
M. Wolpers, W. Nejdl, I. Brunkhorst, An Otelos provider peer for the rdf-based edutella
p2p-network, in: Semantic Authoring, An-
21
notation and Knowledge Markup Workshop
(SAAKM 2002) at 15th European Conf. on
Artificial Intelligence, Lyon, France, 2002.
44. B. Simon, Z. Miklos, W. Nejdl, M. Sintek,
J. Salvachua, Elena: A mediation infrastructure for educational services, Tech.
rep., University of Hannover, Germany,
http://www.kbs.uni-hannover.de/Arbeiten/
Publikationen/2002/elena draft simon.pdf
(Nov. 2002).