Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Relational processing of RDF queries: a survey

2010, ACM SIGMOD Record

Relational Processing of RDF Queries: A Survey Sherif Sakr and Ghazi Al-Naymat School of Computer Science and Engineering University of New South Wales, Sydney, Australia {ssakr,ghazi}@cse.unsw.edu.au ABSTRACT The Resource Description Framework (RDF) is a flexible model for representing information about resources in the web. With the increasing amount of RDF data which is becoming available, efficient and scalable management of RDF data has become a fundamental challenge to achieve the Semantic Web vision. The RDF model has attracted the attention of the database community and many researchers have proposed different solutions to store and query RDF data efficiently. This survey focuses on using relational query processors to store and query RDF data. We provide an overview of the different approaches and classify them according to their storage and query evaluation strategies. 1. INTRODUCTION The goal of the Semantic Web is to provide a common framework for data-sharing across applications, enterprises, and communities. By giving data semantic meaning (through metadata), this framework allows machines to consume, understand, and reason about the structure and purpose of the data. The core of the Semantic Web is built on the Resource Description Framework (RDF) data model [17]. RDF describes a particular resource using a set of RDF statements of the form (subject, predicate, object) triples, also known as (subject, property, value). The subject is the resource, the predicate is the characteristic being described, and the object is the value for that characteristic. Efficient and scalable management of RDF data is a fundamental challenge at the core of the Semantic Web. Several research efforts have been proposed to address these challenges [1, 2, 6, 13, 16, 28]. Relational database management systems (RDBMSs) have repeatedly shown that they are very efficient, scalable and successful in hosting types of data which have formerly not been anticipated to be stored inside relational databases such as complex objects [27], spatiotemporal data [5] and XML data [11]. This survey focuses on using relational query processors to store and query RDF data. We give an overview of the different approaches and classifies them according to their storage and indexing strategy. The rest of the paper is organized as follows. Section 2 introduces preliminaries of the RDF data model and the W3C standard RDF query language, SPARQL. It also introduce the main alternative relational approaches for storing and querying RDF. Sections 3,4 and 5 provide the details of the different techniques in each of the alternative relational approaches. Finally, Section 6 concludes the paper and provides some suggestions for possible future research directions on the subject. 2. RDF-SPARQL PRELIMINARIES The Resource Description Framework (RDF) is a W3C recommendation that has rapidly gained popularity as a mean of expressing and exchanging semantic metadata, i.e., data that specifies semantic information about data. RDF was originally designed for the representation and processing of metadata about remote information sources and defines a model for describing relationships among resources in terms of uniquely identified attributes and values. The basic building block in RDF is a simple tuple model, (subject, predicate, object), to express different types of knowledge in the form of fact statements. The interpretation of each statement is that subject S has property P with value O, where S and P are resource URIs and O is either a URI or a literal value. Thus, any object from one triple can play the role of a subject in another triple which amounts to chaining two labeled edges in a graph-based structure. Thus, RDF allows a form of reification in which any RDF statement itself can be the subject or object of a triple. One of the clear advantage of the RDF data model is its schema-free structure in comparison to the entity-relationship model where the entities, their attributes and relationships to other entities are strictly defined. In RDF, the schema may evolve over the time which fits well with the modern notion of data management, dataspaces, and its pay-as-you-go philosophy [14]. Figure 1 illustrates a sample RDF graph. The SPARQL query language is the official W3C standard for querying and extracting information from RDF graphs [22]. It represents the counterpart to select-project-join queries in the relational model. It is based on a powerful graph matching facility, allows binding variables to components in the input RDF graph and supports conjunctions and disjunctions of triple patterns. In addition, operators akin to relational joins, unions, left outer joins, selections, and projections can be combined to build more expressive queries. SIGMOD Record, December 2009 (Vol. 38, No. 4) 23 UNSW www.cse.unsw.edu.au/~john affiliatedBy John webPage hasName &2 authoredBy NICTA Alice@nicta.com.au hasEmail affiliatedBy &1 hasTitle editedBy &3 publicationType hasName Alice roomNo hasEmail John@cse.unsw.edu.au Querying RDF Data Survey Paper 518 Figure 1: Sample RDF Graph SELECT ?Z WHERE { ?X hasTitle “Querying RDF Data”. ?X publicationType “Survey Paper”. ?X authoredBy ?Y. ?Y webPage ?Z. } Id1 publicationType Survey Paper Id1 hasTitle Querying RDF Data Id1 authoredBy Id2 Id2 hasName John Id2 affiliatedBy UNSW Id2 hasEmail John@cse.unsw.edu.au Id2 webPage www.cse.unsw.edu.au/~john Id1 editedBy Id3 Id3 hasName Alice Id3 affiliatedBy NICTA Id3 hasEmail Alice@nicta.com.au Id3 roomNo 518 Select T3.Object From Triples as T1, Triples as T2, Triples as T3, Triples as T4 Where T1.Predicate=“publicationType” and T1.Object=“Survey Paper” and T2.predicate=“hasTitle” and T2.Object=“Querying RDF Data” and T3.Predicate=“webPage” and T1.subject=T2.subject and T4.subject=T1.subject and T4.Predicate=“authoredBy” and T4.Object = T3.Subject Figure 3: Relational Representation of Triple RDF Stores Publication Figure 2: Sample SPARQL query Id1 A basic SPARQL query has the form: select ?variable1 ?variable2 ... where { pattern1. pattern2. ... } where each pattern consists of subject, predicate and object, and each of these can be either a variable or a literal. The query specifies the known literals and leaves the unknowns as variables which can occur in multiple patterns to constitute join operations. Hence, the query processor needs to find all possible variable bindings that satisfy the given patterns and return the bindings from the projection clause to the application. Figure 2 depicts a sample SPARQL query over the sample RDF graph of Figure 1 to retrieve the web page information of the author of a book chapter with the title ”Querying RDF Data”. Relational database management systems (RDBMSs) have repeatedly shown that they are very efficient, scalable and successful in hosting types of data which have formerly not been anticipated to be stored inside relational databases. In addition, RDBMSs have shown their ability to handle vast amounts of data very efficiently using powerful indexing mechanisms. The relational RDF stores can be mainly classified to the following categories: • Vertical (triple) table stores: where each RDF triple is stored directly in a three-column table (subject, predicate, object). • Property (n-ary) table stores: where multiple RDF properties are modeled as n-ary table columns for the same subject. • Horizontal (binary) table stores: where RDF triples are modeled as one horizontal table or into a set of vertically partitioned binary tables (one table for each RDF property). Figures 3,4 and 5 illustrate examples of the three alternative relational representations of the sample RDF graph (Figure 1) and their associated SQL queries for evaluating the sample SPARQL query (Figure 2). 24 Survey Paper Querying RDF Data Id2 id3 Person Id2 John UNSW John@cse.unsw.edu.au Id3 Alice NICTA Alice@nicta.com.au www.cse.unsw.edu.au/~john 518 Select Person.webPage From Person, Publication Where Publication.publicationType = “Survey Paper” and Publication.hasTitle = “Querying RDF Data” and Publication.authoredBy = Person.ID Figure 4: Relational Representation of Property Tables RDF Stores 3. VERTICAL (TRIPLE) STORES Harris and Gibbins [12] have described the 3store RDF storage system. The storage system of 3Store is based on a central triple table which holds the hashes for the subject, predicate, object and graph identifier. The graph identifier is equal to zero if the triple resides in the anonymous background graph. A symbols table is used to allow reverse lookups from the hash to the hashed value, for example, to return results. Furthermore it allows SQL operations to be performed on pre-computed values in the data types of the columns without the use of casts. For evaluating SPARQL queries, the triples table is joined once for each triple in the graph pattern where variables are bound to their values when they encounter the slot in which the variable appears. Subsequent occurrences of variables in the graph pattern are used to constrain any appropriate joins with their initial binding. To produce the intermediate results table, the hashes of any SPARQL variables required to be returned in the results set are projected and the hashes from the intermediate results table are joined to the symbols table to provide the textual representation of the results. Neumann and Weikum [20] have presented the RDF-3X (RDF Triple eXpress) RDF query engine which tries to overcome the criticism that triples stores incurs too many expensive self-joins by creating the exhaustive set of indexes and relying on fast processing of merge joins. The physical design of RDF-3x is workload-independent and eliminates the need SIGMOD Record, December 2009 (Vol. 38, No. 4) publicationType Id1 Survey Paper hasTitle Id1 Querying RDF Data hasName affiliatedBy Id2 John Id2 UNSW Id3 Alice Id3 NICTA hasEmail roomNo Id2 John@cse.unsw.edu.au Id3 Alice@nicta.com.au Id3 518 webPage Id2 www.cse.unsw.edu.au/~john authoredBy Id1 Id2 editedBy Id1 Id3 Select webPage.value From PublicationType, hasTitle, authoredBy, webPage Where publicationType.value = “Survey Paper” and hasTitle.value = “Querying RDF Data” and publicationType.ID = hasTitle.ID and publicationType.ID = authoredBy.ID and authoredBy.value = webPage.ID Figure 5: Relational Representation of Binary Tables RDF Stores for physical-design tuning by building indexes over all 6 permutations of the three dimensions that constitute an RDF triple. Additionally, indexes over count-aggregated variants for all three two-dimensional and all three one-dimensional projections are created. The query processor follows the RISC-style design philosophy [7] by using the full set of indexes on the triple tables to rely mostly on merge joins over sorted index lists. The query optimizer relies upon its cost model in finding the lowest-cost execution plan and mostly focuses on join order and the generation of execution plans. In principle, selectivity estimation has a huge impact on plan generation. While this is a standard problem in database systems, the schema-free nature of RDF data makes the problem more challenging. RDF-3X employs dynamic programming for plan enumeration, with a cost model based on RDF-specific statistical synopses. It relies on two kinds of statistics: 1) specialized histograms which are generic and can handle any kind of triple patterns and joins. The disadvantage of histograms is that it assumes independence between predicates. 2) frequent join paths in the data which give more accurate estimation. During query optimization, the query optimizer uses the joinpath selectivity information when available and otherwise assume independence and use the histograms information. In [21] the authors have extended the work further by introducing a run-time technique for accelerating query executions. It uses a light-weight, RDF-specific technique for sideways information passing across different joins and index scans within the query execution plans. They have also enhanced the selectivity estimator of the query optimizer by using very fast index lookups on specifically designed aggregation indexes, rather than relying on the usual kinds of coarse-grained histograms. This provides more accurate es- SIGMOD Record, December 2009 (Vol. 38, No. 4) timates at compile-time, at a fairly small cost that is easily amortized by providing better directives for the join-order optimization. Weiss, et al. [28] have presented the Hexastore RDF storage scheme with main focuses on scalability and generality in its data storage, processing and representation. Hexastore is based on the idea of indexing the RDF data in a multiple indexing scheme [13]. It does not discriminate against any RDF element and treats subjects, properties and objects equally. Each RDF element type have its special index structures built around it. Moreover, every possible ordering of the importance or precedence of the three elements in an indexing scheme is materialized. Each index structure in a Hexastore centers around one RDF element and defines a prioritization between the other two elements. Two vectors are associated with each RDF element (e.g. subject), one for each of the other two RDF elements (e.g. property and object). In addition, lists of the third RDF element are appended to the elements in these vectors. In total, six distinct indices are used for indexing the RDF data. These indices materialize all possible orders of precedence of the three RDF elements. A clear disadvantage of this approach is that Hexastore features a worst-case five-fold storage increase in comparison to a conventional triples table. 4. PROPERTY TABLE STORES Due to the proliferations of self-joins involved with the triplestore, the property table approach was proposed. Jena is a an open-source toolkit for Semantic Web programmers [19]. It implements persistence for RDF graphs using an SQL database through a JDBC connection. The schema of the first version of Jena, Jena1, consisted of a statement table, a literals table and a resources table. The statement table (Subject, Predicate, ObjectURI, ObjectLiteral) contained all statements and referenced the resources and literals tables for subjects, predicates and objects. To distinguish literal objects from resource URIs, two columns were used. The literals table contained all literal values and the resources table contained all resource URIs in the graph. However, every query operation required multiple joins between the statement table and the literals table or the resources table. To address this problem, the Jena2 schema trades-off space for time. It uses a denormalized schema in which resource URIs and simple literal values are stored directly in the statement table. In order to distinguish database references from literals and URIs, column values are encoded with a prefix that indicates the type of the value. A separate literals table is only used to store literal values whose length exceeds a threshold, such as blobs. Similarly, a separate resources table is used to store long URIs. By storing values directly in the statement table it is possible to perform many queries without a join. However, a denormalized schema uses more database space because the same value (literal or URI) is stored repeatedly. The increase in database space consumption is addressed by using string compression schemes. Both Jena1 and Jena2 permit multiple graphs to be stored in a single database instance. In Jena1, all graphs were stored in a single statement. However, Jena2 supports the use of multiple statement tables in a single database so that applications can flexibly map graphs to different tables. In this way, graphs that are often accessed together may be stored 25 together while graphs that are never accessed together may be stored separately. In principle, applications typically have access patterns in which certain subjects and/or properties are accessed together. For example, a graph of data about persons might have many occurrences of objects with properties name, address, phone, gender that are referenced together. Jena2 uses property table as a general facility for clustering properties that are commonly accessed together. A property table is a separate table that stores the subject-value pairs related by a particular property. A property table stores all instances of the property in the graph where that property does not appear in any other table used for the graph. In Jena1, each query is evaluated with a single SQL select query over the statement table. In Jena2, queries have to be generalized because there can be multiple statement tables for a graph. Using the knowledge of the frequent access patterns to construct the property-tables and influence the underlying database storage structures can provide a performance benefit and reduce the number of join operations during the query evaluation process. Chong et al. [8] have introduced an Oracle-based SQL table function RDFMATCH to query RDF data. The results of RDFMATCH table function can be further processed by SQL’s rich querying capabilities and seamlessly combined with queries on traditional relational data. The core implementation of RDFMATCH query translates to a self-join query on triple-based RDF table store. The resulting query is executed efficiently by making use of B-tree indexes as well as creating materialized join views for specialized subjectproperty. Subject-Property Matrix materialized join views are used to minimize the query processing overheads that are inherent in the canonical triple-based representation of RDF. The materialized join views are incrementally maintained based on user demand and query workloads. A special module is provided to analyze the table of RDF triples and estimate the size of various materialized views, based on which a user can define a subset of materialized views. For a group of subjects, the system defines a set of single-valued properties that occur together. These can be direct properties of these subjects or nested properties. A property p1 is a direct property of subject x1 if there is a triple (x1 , p1 , x2 ). A property pm is a nested property of subject x1 if there is a set of triples such as, (x1 , p1 , x2 ), ..., (xm , pm , xm+1 ), where m > 1. For example, if there is a set of triples, (John, address, addr1), (addr1, zip, 03062), then the zip property is considered as a nested property of John. Levandoski and Mokbel [15] have presented another property table approach for storing RDF data without any assumption about the query workload statistics. The main goals of this approach are: (1) reducing the number of join operations which are required during the RDF query evaluation process by storing related RDF properties together (2) reducing the need to process extra data by tuning null storage to fall below a given threshold. The approach provides a tailored schema for each RDF data set which represents a balance between property tables and binary tables and is based on two main parameters: 1) Support threshold which represents a value to measure the strength of correlation between properties in the RDF data. 2) The null threshold 26 which represents the percentage of null storage tolerated for each table in the schema. The approach involves two phases: clustering and partitioning. The clustering phase scans the RDF data to automatically discover groups of related properties (i.e., properties that always exist together for a large number of subjects). Based on the support threshold, each set of n properties which are grouped together in the same cluster are good candidates to constitute a single n-ary table and the properties which are not grouped in any cluster are good candidates for storage in binary tables. The partitioning phase goes over the formed clusters and balances the tradeoff between storing as many RDF properties in clusters as possible while keeping null storage to a minimum based on the null threshold. One of the main concerns of the partitioning phase is twofold. The first is to ensure that there is no overlap between the clusters and that each property exists in a single cluster. The second is to reduce the number of table accesses and unions necessary in query processing. Matono, et al. [18] have proposed a path-based relational RDF database. The main focus of this approach is to improve the performance for path queries by extracting all reachable path expressions for each resource and store them. Thus, there is no need to perform join operations unlike the flat tripe stores or the property tables approach. In this approach, the RDF graph is divided into subgraphs and then each subgraph is stored by applicable techniques into distinct relational tables. More precisely, all classes and properties are extracted from RDF schema data, and all resources are also extracted from RDF data. Each extracted item is assigned an identifier and a path expression and stored in corresponding relational table. 5. HORIZONTAL STORES Abadi, et al. [1] have presented SW-Store as a new DBMS which stores RDF data using a fully decomposed storage model (DSM) [10]. In this approach, the triples table is rewritten into n two-column tables where n is the number of unique properties in the data. In each of these tables, the first column contains the subjects that define that property and the second column contains the object values for those subjects while the subjects that do not define a particular property are simply omitted from the table for that property. Each table is sorted by subject, so that particular subjects can be located quickly, and that fast merge joins can be used to reconstruct information about multiple properties for subsets of subjects. For a multi-valued attribute, each distinct value is listed in a successive row in the table for that property. One advantage of this approach is that while property tables need to be carefully constructed so that they are wide enough but not too wide to independently answer queries, the algorithm for creating tables in the vertically partitioned approach is straightforward and need not change over time. Moreover, in the property-class schema approach, queries that do not restrict on class tend to have many union clauses while in the vertically partitioned approach, all data for a particular property is located in the same table and thus union clauses in queries are less common. The implementation of SW-Store relies on a columnoriented DBMS, C-store [26], to store tables as collections of columns rather than as collections of rows. In standard roworiented databases (e.g., Oracle, DB2, SQLServer, Postgres, etc.) entire tuples are stored consecutively. The problem SIGMOD Record, December 2009 (Vol. 38, No. 4) with this is that if only a few attributes are accessed per query, entire rows need to be read into memory from disk before the projection can occur. By storing data in columns rather than rows, the projection occurs for free only where those columns that are relevant to a query need to be read. [3, 9] have argued that storing a sparse data set (like RDF) in multiple tables can cause problems. They suggested storing a sparse data set in a single table while the complexities of sparse data management can be handled inside an RDBMS with the addition of an interpreted storage format. The proposed format starts with a header which contains fields such as relation-id, tuple-id, and a tuple length. When a tuple has a value for an attribute, the attribute identifier, a length field (if the type is of variable length), and the value appear in the tuple. The attribute identifier is the id of the attribute in the system catalog while the attributes that appear in the system catalog but not in the tuple are null for that tuple. Since the interpreted format stores nothing for null attributes, sparse data sets in a horizontal schema can in general be stored much more compactly in the format. While the interpreted format has storage benefits for sparse data, retrieving the values from attributes in tuples is more complex. In fact, the format is called interpreted because the storage system must discover the attributes and values of a tuple at tuple-access time, rather than using precompiled position information from a catalog, as the positional format allows. To tackle this problem, a new operator (called EXTRACT operator) is introduced to the query plans to precede any reference to attributes stored in the interpreted format and returns the offsets to the referenced interpreted attribute values which is then used to retrieve the values. Value extraction from an interpreted record is a potentially expensive operation that is dependent on the number attributes stored in a row or the length of the tuple. Moreover, if a query evaluation plan fetches each attribute individually and uses an EXTRACT call per attribute, the record will be scanned for each attribute and will thus be very slow. Thus, a batch EXTRACT technique is used to allow for a single scan of the present values in order to save time. 6. CONCLUDING REMARKS RDF is a main foundation for processing the semantic of information stored on the Web. It is the data model behind the Semantic Web vision whose goal is to enable integration and sharing of data across different applications and organizations. The naive way to store a set of RDF statements is using a relational database with a single table including columns for subject, property and object. While simple, this schema quickly hits scalability limitations. Therefore, several approaches have been proposed to deal with this limitation by using extensive set of indexes or by using selectivity estimations to optimize the join ordering [20, 28]. Another approach to reduce the self-join problem is to create separate tables (property tables) for subjects that tend to have common properties defined [8, 15]. Since Semantic Web data is often semi-structured, storing this data in a row-store can result in very sparse tables as more subjects or properties are added. Hence, this normalization technique is typically limited to resources that contain a similar set of properties and many small tables are usually created. The problem is that this may result in union and join clauses in SIGMOD Record, December 2009 (Vol. 38, No. 4) queries since information about a particular subject may be located in many different property tables. This may complicate the plan generator and query optimizer and can degrade performance. Abadi, et al. [1] have explored the trade-off between triplebased stores and binary tables-based stores of RDF data. The main advantages of binary tables are: • Improved bandwidth utilization: In a column store, only those attributes that are accessed by a query need to be read off disk. In a row-store, surrounding attributes also need to be read since an attribute is generally smaller than the smallest granularity in which data can be accessed. • Improved data compression: Storing data from the same attribute domain together increases locality and thus data compression ratio. Hence, bandwidth requirements are further reduced when transferring compressed data. On the other side, binary tables do have the following main disadvantages: • Increased cost of inserts: Column-stores perform poorly for insert queries since multiple distinct locations on disk have to be updated for each inserted tuple (one for each attribute). • Increased tuple reconstruction costs: In order for column-stores to offer a standards-compliant relational database interface (e.g., ODBC, JDBC, etc.), they must at some point in a query plan stitch values from multiple columns together into a row-store style tuple to be output from the database. Abadi et al. [1] have reported that the performance of binary tables is superior to clustered property table while Sidirourgos et al. [25] reported that even in column-store database, the performance of binary tables is not always better than clustered property table and depends on the characteristics of the data set. Moreover, the experiments of [1] reported that storing RDF data in column-store database is better than that of row-store database while [25] experiments have shown that the gain of performance in columnstore database depends on the number of predicates in a data set. Other independent benchmarking projects [4, 23, 24] have shown that no approach is dominant for all queries and none of these approaches can compete with a purely relational model. Therefore, they are convinced that there is still room for optimization in the proposed generic relational RDF storage schemes and thus new techniques for storing and querying RDF data are still required to bring forward the Semantic Web vision. 7. REFERENCES [1] Daniel J. Abadi, Adam Marcus, Samuel Madden, and Kate Hollenbach. SW-Store: a vertically partitioned DBMS for Semantic Web data management. VLDB Journal, 18(2):385–406, 2009. [2] Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, and Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases. In Proceedings of the 2nd InternationalWorkshop on the Semantic Web 27 (SemWeb), 2001. [3] Jennifer L. Beckmann, Alan Halverson, Rajasekar Krishnamurthy, and Jeffrey F. Naughton. Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format. In Proceedings of the 22nd International Conference on Data Engineering (ICDE), page 58, 2006. [4] Christian Bizer and Andreas Schultz. Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints. In Proceedings of the 4th International Workshop on Scalable Semantic Web knowledge Base Systems (SSWS)., 2008. [5] Viorica Botea, Daniel Mallett, Mario A. Nascimento, and Jörg Sander. PIST: An Efficient and Practical Indexing Technique for Historical Spatio-Temporal Point Data. GeoInformatica, 12(2):143–168, 2008. [6] Jeen Broekstra, Arjohn Kampman, and Frank van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In Proceedings of the First International Semantic Web Conference (ISWC), pages 54–68, 2002. [7] Surajit Chaudhuri and Gerhard Weikum. Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System. In Proceedings of 26th International Conference on Very Large Data Bases (VLDB), pages 1–10, 2000. [8] Eugene Inseok Chong, Souripriya Das, George Eadon, and Jagannathan Srinivasan. An Efficient SQL-based RDF Querying Scheme. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pages 1216–1227, 2005. [9] Eric Chu, Jennifer L. Beckmann, and Jeffrey F. Naughton. The case for a wide-table approach to manage sparse relational data sets. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 821–832, 2007. [10] George P. Copeland and Setrag Khoshafian. A Decomposition Storage Model. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 268–279, 1985. [11] Torsten Grust, Sherif Sakr, and Jens Teubner. XQuery on SQL Hosts. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB), pages 252–263, 2004. [12] Stephen Harris and Nicholas Gibbins. 3store: Efficient Bulk RDF Storage. In Proceedings of the First International Workshop on Practical and Scalable Semantic Systems (PSSS), 2003. [13] Andreas Harth and Stefan Decker. Optimized Index Structures for Querying RDF from the Web. In Proceedings of the Third Latin American Web Congress (LA-WEB), pages 71–80, 2005. [14] Shawn R. Jeffery, Michael J. Franklin, and Alon Y. Halevy. Pay-as-you-go user feedback for dataspace systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 847–860, 2008. [15] Justin J. Levandoski and Mohamed F. Mokbel. RDF Data-Centric Storage. In Proceedings of the IEEE International Conference on Web Services (ICWS), 2009. [16] Li Ma, Zhong Su, Yue Pan, Li Zhang, and Tao Liu. 28 [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] RStar: an RDF storage and query system for enterprise resource management. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pages 484–491, 2004. Frank Manola and Eric Miller. RDF Primer, W3C Recommendation, February 2004. http://www.w3.org/TR/REC-rdf-syntax/. Akiyoshi Matono, Toshiyuki Amagasa, Masatoshi Yoshikawa, and Shunsuke Uemura. A Path-based Relational RDF Database. In Proceedings of the 16th Australasian Database Conference (ADC), pages 95–103, 2005. Brian McBride. Jena: A Semantic Web Toolkit. IEEE Internet Computing, 6(6):55–59, 2002. Thomas Neumann and Gerhard Weikum. RDF-3X: a RISC-style engine for RDF. Proceedings of the VLDB Endownment (PVLDB), 1(1):647–659, 2008. Thomas Neumann and Gerhard Weikum. Scalable join processing on very large RDF graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 627–640, 2009. Eric Prud’hommeaux and Andy Seaborne. SPARQL Query Language for RDF, W3C Recommendation, January 2008. http://www.w3.org/TR/rdf-sparql-query/. Michael Schmidt, Thomas Hornung, Norbert Küchlin, Georg Lausen, and Christoph Pinkel. An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario. In Proceedings of the 7th International Semantic Web Conference (ISWC), pages 82–97, 2008. Michael Schmidt, Thomas Hornung, Georg Lausen, and Christoph Pinkel. SP2Bench: A SPARQL Performance Benchmark. In Proceedings of the 25th International Conference on Data Engineering (ICDE), pages 222–233, 2009. Lefteris Sidirourgos, Romulo Goncalves, Martin L. Kersten, Niels Nes, and Stefan Manegold. Column-store support for RDF data management: not all swans are white. Proceedings of the VLDB Endownment (PVLDB), 1(2):1553–1563, 2008. Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O’Neil, Patrick E. O’Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. C-Store: A Column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pages 553–564, 2005. Can Türker and Michael Gertz. Semantic integrity support in SQL: 1999 and commercial (object-)relational database management systems. VLDB Journal, 10(4):241–269, 2001. Cathrin Weiss, Panagiotis Karras, and Abraham Bernstein. Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endownment (PVLDB), 1(1):1008–1019, 2008. SIGMOD Record, December 2009 (Vol. 38, No. 4)