An RDF modification protocol, based on the needs of editing Tools

Enoksson, Fredrik; Palmér, Matthias; Naeve, Ambjörn

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220792670 An RDF modification protocol, based on the needs of editing Tools Conference Paper · January 2007 DOI: 10.1007/978-0-387-77745-0_18 · Source: DBLP CITATIONS READS 0 20 3 authors, including: Fredrik Enoksson 12 PUBLICATIONS 38 CITATIONS SEE PROFILE Ambjörn Naeve KTH Royal Institute of Technology 103 PUBLICATIONS 1,566 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Im writing a book with the working title: Infrastructures for service-based and evolvable crossinstitutional reasoning View project All content following this page was uploaded by Ambjörn Naeve on 28 November 2016. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. An RDF Modification Protocol, based on the Needs of Editing Tools Fredrik Enoksson1, Matthias Palmér1, Ambjörn Naeve1 1 Royal Institute of Technology(KTH/CSC), Lindstedtsv. 5, 100 44 Stockholm, Sweden {fen,matthias,amb}@csc.kth.se Abstract. The use of RDF on the web is increasing, unfortunately the amount of editing tools suitable for end users without knowledge of technicalities of the language are not so common. We believe that a vital ingredient for the editing tools to flourish is a working remote modification protocol. This will allow editing tools to be developed separately from triple-stores and make them more flexible and reusable. Several initiatives for remote modification exists already but have not gained wide spread adoption. In this paper we will show that most of them fall short when it comes to edit arbitrary RDF construct, especially in combination with typical requirements of editing tools. We will first list these requirements, then propose a solution and finally outline an implementation. With this implementation we will also see how annotation profiles, a configuration mechanism for RDF metadata editors, has the additional feature of making modification requests very precise. 1 Introduction RDF is increasingly used for expressing metadata on the web, consequently appropriate end-user editing tools are in demand. To become more user-friendly such editing tools need to hide the complex RDF-structure behind. This is typically achieved by displaying the metadata in forms and focusing on a single or a few central resources at a time. Beyond the user-interface layer such editing tools need to access a triple-store where the RDF is stored. A common solution is to use the triplestores own API. Unfortunately this approach makes it harder to separate editing tools from the underlying triple-store which has a negative impact on both the variety of triple-stores as well as the success of well designed editing tools. Hence, a common protocol for remote modification of RDF is desirable. Earlier initiatives, such as [1] and [2], have not gained wide acceptance, probably due to the lack of a common query language which more or less is a prerequisite. Now, when SPARQL and the closely related SPARQL protocol for RDF endorsed by W3C has matured, the prospect for a wide acceptance of a remote modification protocol looks brighter. A recent initiative, SPARUL [3], builds on top of SPARQL and looks promising, especially since much of the work done for supporting SPARQL in a triple-store can be reused. However, when taking a closer look, serious limitations from the perspective of an editing tool surfaced. In this paper we will first consider the requirements on a protocol that supports remote modification from the perspective of an editing tool, then list possible approaches for a remote modification protocol, and finally discuss the deficiencies in SPARUL and also outline a simple solution in accordance with the introduced requirements for editing tools. 2 Remote Modification Protocol The following list of requirements are not complete. Rather, they represent requirements from the perspective of editing tools. Even though they might seem quite natural and simple, they are not so easy to fulfill. Especially when taking into account the common usage of blank nodes in RDF . Resource centric – all kinds of subgraphs reachable (possible via intermediate blank nodes) from a named node (resource identified through a URI) should be modifiable. Concise modifications – The modification requests should be concise, not inefficient by transferring to much or to little data at a time. Without side effects – parts of the graph that are not to be modified should be left intact. More specifically, you should not be required to have knowledge of all parts of the graph to be able to modify it successfully. Application independent – There should not be any built-in knowledge in the protocol of specific properties or resources. 2.1 Different approaches for a Remote Modification Protocol The naive approaches for remote modification, i.e. updating one statement at a time or the entire RDF graph, are both flawed. Updating one statement at a time would yield a chatty protocol where a single update operation could require hundreds of requests. On the other side, updating the whole RDF graph could result in the transfer of large amounts of data where large parts are sent unmodified back and forth. It would also be problematic to support simultaneous updates with this approach as the entire graph would have to be locked. A better approach could be to send only deltas (differences) between RDF graphs as described in [4]. The described strong delta is especially interesting as they can be applied to subsets of the whole RDF graph. Unfortunately, the outlined algorithm breaks down whenever there are blank nodes in the graph unless there are unique ways to identify them. (Breaking or leaving orphaned blank nodes in the graph is of course not acceptable, see the requirement 'without side effects'.) As the general problem of identifying them requires finding the largest common subgraph which is a graph isomorphism problem which in turn has been proven to be NP-complete, see discussion in [4], another path has been taken. Instead, the algorithm relies on knowledge of how the graph was constructed, i.e. according to which OWL ontology. Hence, if there are inbound functional or outbound inversely functional properties from a blank node to another identifiable node in the graph the blank node can also be identified. Unfortunately, there are a lot of real world situations when the data does not follow an OWL ontology, or even if there is an OWL ontology it may not be known by the tool or simply that there is not enough functional or inversely functional properties to uniquely identify the blank nodes. Taken together, this approach seem to be too brittle to base a remote modification protocol on. Another approach is sending an easily identified subgraph that encompasses the modification. To avoid the problem of preserving identity of blank nodes, the subgraph should be defined in such a way that it shares no blank nodes with the rest of the graph. In the general case, this requirement could mean that the subgraph will be large, or even identical to the whole graph, for example if the graph consists solely of blank nodes. However, in nearly all real editing scenarios that we care about from the perspective of the requirement 'resource centric' listed above, there is a mixture of named nodes and blank nodes. Furthermore, the blank nodes are typically arranged into tree like data-structures that are reachable from the named nodes and not interconnected in a larger graph except via the named nodes. This is not something we postulate here, but rather a consequence of an established best practice where you name the resources that you express metadata on with URIs. From this observation we realize that a useful method of calculating a subgraph is the anonymous closure (or the closely related concise bounded descriptions), i.e. starting from a named resource and then recursively including all statements until other named resources are encountered. It is important that the calculation of the subgraph is done deterministically, as with anonymous closure, it has to be done twice, first to be accessed for remote modification and second to be removed from the bigger graph before the modified subgraph is inserted. The alternatives, to calculating the subgraph twice, are either removing the subgraph directly or keeping a copy of the subgraph on the server side for later removal. The first alternative will yield a graph where data is missing from time to time, this should clearly be avoided. The second alternative requires a stateful protocol and sessions to keep track of who initiated a modification on a specific subgraph. This should also be avoided both to avoid complexity and to be compatible with the principles of REST (sometimes referred to as the architecture of the web). Hence, the preferred alternative is calculating the subgraph twice. Even though the anonymous closure subgraph approach is simple and solves the problem with graphs with blank nodes in it, it has to be complemented to allow 'concise modifications' of larger subgraphs including several named nodes. (See above for the 'concise modification' requirement.) If you know the names of all the named nodes you can simply take the anonymous closure of each and one of them. However, this is not always the case, instead you typically only know the name of one of the named nodes and how the other nodes are connected to it. In this case you have to express the relationship from the known named node to the other nodes in a query language and then require the anonymous closure of all the matches. With SPARQL this can be almost achieved with graph patterns and the DESCRIBE option. Unfortunately, DESCRIBE is formally undefined and left to the triple-store to implement, however, for general purpose triple-stores the anonymous closure algorithm or it's close counterparts seems to be frequently used. 2.2 Deficiencies in SPARUL The main idea with SPARUL is to specify which statements to DELETE and which to INSERT into a specific model. In it simplest form the statements are simply listed, and the approach has the limitations of sending deltas without the elaborate scheme to identify blank nodes as discussed above. Hence, in this simple form, statements with blank nodes can be inserted but never removed or modified. In the more advanced case the delete and insert blocks contain templates which generate the statements to DELETE and INSERT via matching of a WHERE clause. Modifying subgraphs containing blank nodes is in this case possible but awkward as: First, it requires the WHERE clause to uniquely identify the right subgraph and capture the blank nodes in appropriate variables. Second the templates in DELETE and INSERT has to be carefully constructed to express the modification to be done. For simple and well known metadata and specific applications this is perhaps feasible. But in the general, the required algorithm would be complex and be done in every compliant client that want to use the protocol. We argue that a better approach is to allow the DELETE to be combined with DESCRIBE based on the variables matched in WHERE. In this case it will be possible to delete a series of anonymous closures according to the principle described above and then list the modified subgraph to be inserted in INSERT. 3 Approach: calculating the subgraph twice The solution suggested in the last part of section 2.1 will in the following subsection be described in a more detailed way. In following section 4 an implementation specific way of doing this in a dynamic way by reusing a configuration mechanism for RDF metadata editors. 3.1 Retrieving a proper subgraph to edit In order to find the proper subgraph a resource is needed as a starting point. This the central resource from where to start the search of what related resources to be edited. Since not all the resources connected to the starting point is known a pattern has to be provided. This way arbitrary deep constructs can be retrieved by using the DESCRIBE query in SPARQL. As said before in this paper DESCRIBE is not formally defined and can therefore return different answer on different implementations. For the purpose of retrieving a proper subgraph all the direct properties of the resource, the Concise Bounded Description for the resource and all properties for the resources matched in the pattern needs to be returned. In the following an example will be given where we want to edit the dc:title, dc:creator and the dc:subject for a given resource. The RDF graph on the remote storage is depicted in figure 1 Figure 1: An example RDF graph on the remote storage In order to retrieve the subgraph with a DESCRIBE-query the following question is enough: DESCRIBE <http://www.example.com/lo1> ?creator ?subject WHERE { OPTIONAL { <http://www.example.com/lo1> <http://purl.org/dc/terms/creator> OPTIONAL { <http://www.example.com/lo1> <http://purl.org/dc/terms/subject> } ?creator . } ?subject . } From the pattern in this example, a model with all properties and values for the resource http://www.example.com/lo1 will be returned, and, all properties and values for potential resources matching the variables ?creator and ?subject. The reason that we do not include a title property in the query is because it is a direct property. The reason why we include subject and creator is that they might point to non-anonymous resources. The use of OPTIONAL for every property also assures that if one property will not match, a graph will anyway be returned with all the properties if such a resource exist. 3.2 Updating the remote storage When the retrieved model has been modified by the application the subgraph on the remote storage is to be updated with these modification. An update-request to the remote storage consists of two parts, first, an indication of which subgraph to remove and second, a subgraph to be inserted as a replacement. This subgraph have to be sent by serialising RDF into whatever serialisation is supported by the remote storage. The subgraph to remove can on the other hand be indicated by a query (that is, the same query used to retrieve it in the first place). After the remote server have received the query to calculate the subgraph to remove and the model to insert, it first need to calculate from the query what to remove and from that remove the subgraph. After that the subgraph to be inserted can be put into the model on the remote storage. 4. Implementation An implementation of the described approach in section 3 have been made by the authors of this paper. The remote storage used is version 3.1 of the Joseki1 server and the editor application was implemented with the SHAME2 code library. It is a code library used to built editors that can be configured with RDF metadata Annotation Profiles, as described in [5] and [6]. An RDF metadata Annotation Profile consists of a Form Template and a Graph Pattern, where the latter can be used to fairly easy create a query used to retrieve the subgraph to edit from the remote storage. A Graph Pattern is expressed in an RDF query language like SPARQL, it defines the structure of the metadata to be edited and also act as a template when new metadata structures are to be created. The Graph Pattern is a tree structure where the root-node variable matches the resource to be edited and intermediate resources are variable nodes that matches nodes in the RDFmodel. A query to send to the remote storage can be constructed from the Graph Pattern by removing the leafs from the Graph Pattern and send it as a DESCRIBE query. As the Annotation Profile is constructed from the perspective of usability for the end users, you can expect that it encompasses a suitable subgraph to update in one operation. And as the DESCRIBE query is constructed from the graph pattern it will correspond to a concise modification (one of our requirements). In the implementation this is sent to the Joseki server over HTTP, and returned is the calculated subgraph, or an empty graph if no subgraph can be calculated. Once the model has been retrieved it is edited by applying the method that is described in [5] and [6]. The editing process, described shortly here, will match the Graph Pattern against the retrieved subgraph, and create a binding to the matching variables. The bindings are combined with the Form Template that makes it possible to create a Graphical User Interface that also hides all the complexity from the end user. This makes it possible for the end user to modify the subgraph in a rather simple form-based manner. When the changes to the subgraph are finished, ie the end user decides to save the modifications, the modified subgraph is serialised into RDF/XML and the query to calculate the original subgraph is expressed as a DESCRIBE query in SPARQL, as said before is the same as the one to retrieve it. For the Joseki-server to perform the update we have to implement and add a service called Update that is called with these two arguments. On the Joseki server the operation of removing the calculated 1 2 http://www.joseki.org http://kmr.nada.kth.se/shame subgraph is performed first followed by inserting the modified subgraph. Joseki is using Jena to handle RDF and the two operations is implemented using the methods add and remove defined in the interface Model in the Jena API. 5. Conclusion In this paper we have described several initiatives/approaches to remotely edit RDF graphs, according to the requirements of being Resource centric, allowing Concise modifications, being Without side effects, and being Application independent, We specifically discussed a recent initiative, SPARUL, that are very promising but falls short regarding handling blank nodes properly in combination with the requirements. The only approach that could meet all the requirements for the needs of an editing tool was the one described in section 3. The approach relies on that most graphs consists of a mixture of blank and non blank nodes and that replacing whole subgraphs calculated from one or a few starting points corresponds well to the extent of a typical update. The update is performed by retrieving a subgraph, modifying it and then submitting a the modified subgraph back to replace the original. In addition to the modification mechanism, the paper has shortly outlined how to use Annotation Profiles, a configuration mechanism for remote editing tools, to retrieve the subgraph needed for editing. Consequently, if such an approach is used, the modification protocol will work automatically, avoiding manual construction of queries. A rather important issue that is out of the scope of this paper is how to handle concurrent changes of the same subgraph. The subgraph is not locked after the first retrieval of it and changes could be done inside the same subgraph by another user. Additional to the hassle of updates being overwritten falsely, this might lead to inconsistencies in the RDF, since differences in the queries used to extract the subgraph may lead to orphaned constructs. Some kind of additional restrictions on the queries may be needed for a consistent locking mechanism to be feasible. Acknowledgement This work has been carried out with financial support from the EU-FP6 project LUISA, which the authors gratefully acknowledge. References 1. 2. 3. 4. 5. 6. Seaborne, A.: An RDF NetAPI, Proceedings of the First International Semantic Web Conference on The Semantic Web, 2002 Nejdl, W., Siberski, W., Simon, B., Tane, J.: Towards a Modification Exchange Language for Distributed RDF Repositories. Proceedings of the First International Semantic Web Conference on The Semantic Web, 2002 Seaborne, A., Manjunath, G., SPARQL/Update A language for updating RDF graphs, http://jena.hpl.hp.com/~afs/SPARQL-Update.html Berners-Lee, T., Connolly, D., Delta: an ontology for the distribution of differences between RDF graphs, http://www.w3.org/DesignIssues/Diff. Palmér, M, Enoksson, F., Naeve, A., D3.2: Annotation Profile Specification, Retrieved, August 30, 2007, from http://www.luisa-project.eu Palmér, M, Enoksson, F., Nilsson, M., Naeve, A., Annotation Profiles: Configuring forms to edit RDF, Proceedings of the Dublin Core Metadata Conference, Singapore, 2007

Log In

An RDF modification protocol, based on the needs of editing Tools

An RDF modification protocol, based on the needs of editing Tools

Related Papers

RELATED PAPERS