Abstract
RDF is increasingly being used to encode data for the semantic web and data exchange. There have been a large number of works that address RDF data management following different approaches. In this paper we provide an overview of these works. This review considers centralized solutions (what are referred to as warehousing approaches), distributed solutions, and the techniques that have been developed for querying linked data. In each category, further classifications are provided that would assist readers in understanding the identifying characteristics of different approaches.
Similar content being viewed by others
References
Suchanek F M, Kasneci G, Weikum G. Yago: a core of semanticknowledge. In: Proceedings of the 16th ACM International Conference on World Wide Web. 2007, 697–706
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S. DBpedia — a crystallization point for the web of data. J. Web Semantics: Science, Services and Agents on the World Wide Web, 2009, 7(3): 154–165
Schmachtenberg M, Bizer C, Paulheim H. Adoption of best data practices in different topical domains. In: Proceedings of the 13th International Semantic Web Conference. 2014, 245–260
Zhang Y, Duc P M, Corcho O, Calbimonte J P. SRBench: A streamingRDF/ SPARQL benchmark. In: Proceedings of the 11th International. Semantic Web Conference. 2012, 641–657
Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, Auer S. Qualityassessment for linked data: a survey. Semantic Web, 2015, 7(1): 63–93
Tang N. Big RDF data cleaning. In: proceedings of the 31st IEEE International Conference onData Engineering Workshops. 2015, 77–79
Klyne G, Carroll J J, McBride B. RDF 1.1 concepts and abstract syntax. W3C Recommendation, 2014
Harris S, Seaborne A, Prud’hommeaux E. SPARQL 1.1 query language. W3C Recommendation, 2013
Zou L, Özsu M T, Chen L, Shen X, Huang R, Zhao D. gStore: agraphbased SPARQL query engine. The VLDB journal, 2014, 23(4): 565–590
Hartig O, Özsu MT. Reachable subwebs for traversal-based query execution. In: Proceedings of the 23rd International Conference on World Wide Web. 2014, 541–546
Hartig O. SPARQL for a web of linked data: semantics and computability. In: Proceedings of the 9th Extended Semantic Web Conference. 2012, 8–23
W3C. SPARQL query language for RDF — formal definitions. Accessible at https://www.w3.org/2001/sw/DataAccess/rq23/sparqldefns. html. 2006
Wilkinson K. Jena property table implementation. Technical Report HPL-2006-140. 2006
Angles R, Gutierrez C. Theexpressive power of SPARQL. In: Proceedings of the 7th International Semantic Web Conference. 2008, 114–129
Sequeda J F, Arenas M, Miranker D P. OBDA: query rewriting or materialization? in practice, both! In: Proceedings of the 13th International Semantic Web Conference. 2014, 535–551
Broekstra J, Kampman A, Van Harmelen F. Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the 1st International Semantic Web Conference. 2002, 54–68
Chong E, Das S, Eadon G, Srinivasan J. An efficient SQL-based RDF querying scheme. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 1216–1227
Weiss C, Karras P, Bernstein A. Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment, 2008, 1(1): 1008–1019
Neumann T, Weikum G. RDF-3X: a RISC-style engine for RDF. Proceedings of the VLDB Endowment, 2008, 1(1): 647–659
Neumann T, Weikum G. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 2009, 19(1): 91–113
Bornea M A, Dolby J, Kementsietsidis A, Srinivas K, Dantressangle P, Udrea O, Bhattacharjee B. Building an efficient RDF store over a relational database. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 121–132
Abadi D J, Marcus A, Madden S R, Hollenbach K. Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 411–422
Abadi D J, Marcus A, Madden S, Hollenbach K. SW-Store: a vertically partitioned DBMS for semantic web data management. The VLDB Journal, 2009, 18(2): 385–406
Sidirourgos L, Goncalves R, Kersten M, Nes N, Manegold S. Columnstore support for RDF data management: not all swans are white. Proceedings of the VLDB Endowment, 2008, 1(2): 1553–1563
Bönström V, Hinze A, Schweppe H. Storing RDF as a graph. In: Proceedings of the1st Latin American Web Congress. 2003, 27–36
Zou L, Mo J, Chen L, Özsu M T, Zhao D. gStore: answering SPARQL queries via subgraph matching. Proceedings of theVLDB Endowment, 2011, 4(8): 482–493
Aluç G. Workload matters: arobust approach to physical RDF database design. Dissertation for the Doctoral Degree. Waterloo: University of Waterloo, 2015
Peng P, Zou L, Özsu M T, Chen L, Zhao D. Processing SPARQL queries over distributed RDF graphs. The VLDB Journal, 2016, 25(2): 243–268
Khadilkar V, Kantarcioglu M, Thuraisingham B M, Castagna P. Jena-HBase: a distributed, scalable and efficient RDF triple store. In: Proceedings of the 11th International Semantic Web Conference Posters & Demonstrations Track. 2012, 85–88
Rohlo_ K, Schantz R E. High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: Proceedings of ACM International Workshop on Programming Support Innovations for Emerging Distributed Applications. 2010
Husain M F, McGlothlin J, Masud M M, Khan L R, Thuraisingham B. Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(9): 1312–1327
Zhang X, Chen L, Wang M. Towards efficient join processing overlarge RDF graph using mapreduce. In: Proceedings of the 24th International Conference on Scientific and Statistical Database Management. 2012, 250–259
Zhang X, Chen L, Tong Y, Wang M. EAGRE: towards scalable I/Oefficient SPARQL query evaluation on the cloud. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 565–576
Zeng K, Yang J, Wang H, Shao B, Wang Z. A distributed graph engine for web scale RDF data. Proceedings of the VLDB Endowment, 2013, 6(4): 265–276
Papailiou N, Konstantinou I, Tsoumakos D, Koziris N. H2RDF: adaptive query processing on RDF data in the cloud. In: Proceedings of the 21st ACM International Conference Companion on World Wide Web. 2012, 397–400
Papailiou N, Tsoumakos D, Konstantinou I, Karras P, Koziris N. H2RDF+: an efficient data management system for big RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2014, 909–912
Kaoudi Z, Manolescu I. RDF in the clouds: a survey. The VLDB Journal, 2015, 24: 67–91
Li F, Ooi B C, Özsu M T, Wu S. Distributed data management using MapReduce. ACM Computing Surveys (CSUR), 2014, 46(3)
Karypis G, Kumar V. Analysis of multilevel graph partitioning. In: Proceedings of the ACM/IEEE Conference on Supercomputing. 1995
Shao B, Wang H, Li Y. Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 505–516
Huang J, Abadi D J, Ren K. Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment, 2011, 4(11): 1123–1134
Hose K, Schenkel R. WARP: workload-aware replication and partitioning for RDF. In: Proceedings of the 29th IEEE International Conference on Data Engineering Workshops. 2013, 1–6
Galarraga L, Hose K, Schenkel R. Partout: a distributed engine for efficient RDF processing. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web. 2014, 267–268
Lee K, Liu L. Scaling queries over big RDF graphs with semantic hash partitioning. Proceedings of the VLDB Endowment, 2013, 6(14): 1894–1905
Gurajada S, Seufert S, Miliaraki I, Theobald M. TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2014, 289–300
Quilitz B. Querying distributed RDF data sources with SPARQL. In: Proceedings of the 5th European Semantic Web Conference. 2008, 524–538
Harth A, Hose K, Karnstedt M, Polleres A, Sattler K, Umbrich J. Data summaries for on-demand queries over linked data. In: Proceedings of the 19th ACM International Conference on World Wide Web. 2010, 411–420
Görlitz O, Staab S. SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: Proceedings of ISWC Workshop on Consuming Linked Data. 2011
Saleem M, Ngomo A N. HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation. In: Proceedings of the 11th Extended Semantic Web Conference. 2014, 176–191
Saleem M, Padmanabhuni S S, Ngomo A N, Iqbal A, Almeida J S, Decker S, Deus H F. TopFed: TCGA tailored federated query processing and linking to LOD. Biomedical Semantics, 2014, 5: 47
Schwarte A, Haase P, Hose K, Schenkel R, Schmidt M. FedX: optimization techniques for federated query processing on linked data. In: Proceedings of the 10th International SemanticWeb Conference. 2011, 601–616
Astrahan M M, Blasgen M W, Chamberlin D D, Eswaran K P, Gray J N, Griffiths P P, King W F, Lorie R A, McJones P R, Mehl J W, Putzolu G R, Traiger I L, Wade B W, Watson V. System R: relational approach to database management. ACM Transactions on Database Systems (TODS), 1976, 1(2): 97–137
Hartig O. An overview on execution strategies for linked data queries. Datenbank-Spektrum, 2013, 13(2): 89–99
Hartig O. SQUIN: a traversal based query execution system for the web of linked data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 1081–1084
Ladwig G, Tran T. SIHJoin: Querying remote and local linked data. In: Proceedings of the 8th Extended Semantic Web Conference. 2011, 139–153
Umbrich J, Hose K, Karnstedt M, Harth A, Polleres A. Comparing data summaries for processing live queries over linked data. World Wide Web, 2011, 14(5–6): 495–544
Ladwig G, Tran T. Linked data query processing strategies. In: Proceedings of the 9th International Semantic Web Conference. 2010, 453–469
Chaudhuri S, Narasayya V. Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 3–14
Halim F, Idreos S, Karras P, Yap R H C. Stochastic database cracking: towards robust adaptive indexing main-memory column-stores. Proceedings of the VLDB Endowment, 2012, 5(6): 502–513
Duan S, Kementsietsidis A, Srinivas K, Udrea O. Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 145–156
Kim J, Shin H, Han W S, Hong S, Chafi H. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8(11): 1238–1249
Aluç G, Hartig O, Özsu M T, Daudjee K. Diversified stress testing of RDF data management systems. In: Proceedings of the 13th International Semantic Web Conference. 2014, 197–212
Aluç G, Özsu MT, Daudjee K. Workload matters: why RDF databases need a new design. Proceedings of the VLDB Endowment, 2014, 7(10): 837–840
Aluç G, Özsu M T, Daudjee K, Hartig O. Executing queries over schemaless RDF databases. In: Proceedings of the 31st International Conference on Data Engineering. 2015, 807–818
Aluç G, Özsu M T, Daudjee K. Clustering RDF databases using Tunable-LSH. Eprint Arxiv, 2015
Indyk P, Motwani R. Approximate nearest neighbors: towards removingthe curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing. 1998, 604–613
Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases. 1999, 518–529
Idreos S, Kersten M L, Manegold S. Database cracking. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research. 2007, 68–78
Idreos S, Kersten M L, Manegold S. Self-organizing tuple reconstruction in column-stores. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2009, 297–308
Idreos S, Manegold S, Kuno H A, Graefe G. Merging what’s cracked, cracking what’s merged: Adaptive indexing in main-memory columnstores. Proceedings of the VLDB Endowment, 2011, 4(9): 585–597
Author information
Authors and Affiliations
Corresponding author
Additional information
M. Tamer Özsu is a professor of computer science at the University of Waterloo, Canada. Dr. Özsu’s current research focuses on large scale data distribution, and management of unconventional data (e.g., graphs, RDF, XML, and streams). He is a fellow of ACM and IEEE, an elected member of the Science Academy of Turkey, and a member of Sigma Xi and AAAS.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Özsu, M.T. A survey of RDF data management systems. Front. Comput. Sci. 10, 418–432 (2016). https://doi.org/10.1007/s11704-016-5554-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-016-5554-y