Distributed RDF Query Processing

Sakr, Sherif; Wylot, Marcin; Mutharaju, Raghava; Le Phuoc, Danh; Fundulaki, Irini

doi:10.1007/978-3-319-73515-3_4

Sherif Sakr⁶,
Marcin Wylot⁷,
Raghava Mutharaju⁸,
Danh Le Phuoc⁷ &
…
Irini Fundulaki⁹

1461 Accesses
1 Citations

Abstract

With increasing sizes of RDF datasets, executing complex queries on a single node has turned to be impractical especially when the node’s main memory is dwarfed by the volume of the dataset. Therefore, there was a crucial need for distributed systems with a high degree of parallelism that can satisfy the performance demands of complex SPARQL queries. In this chapter, we give an overview of various techniques and systems for efficiently querying large RDF datasets in distributed environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D.J. Abadi, A. Marcus, S.R. Madden, K. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment (2007), pp. 411–422
Google Scholar
M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo, E. Ruckhaus, ANAPSID: an adaptive query processing engine for SPARQL endpoints, in The Semantic Web–ISWC (2011), pp. 18–34
Google Scholar
Z. Akar, T.G. Halaç, E.E. Ekinci, O. Dikenelli, Querying the web of interlinked datasets using VOID descriptions, in LDOW, vol. 937 (2012)
Google Scholar
K. Alexander, M. Hausenblas, Describing linked datasets—on the design and usage of void, the vocabulary of interlinked datasets, in Linked Data on the Web Workshop (LDOW 09), in Conjunction with 18th International World Wide Web Conference (WWW 09) (2009)
Google Scholar
R. Al-Harbi, I. Abdelaziz, P. Kalnis, N. Mamoulis, Y. Ebrahim, M. Sahli, Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J. 25(3), 355–380 (2016)
Google Scholar
G. Aluc, M. Tamer Özsu, K. Daudjee, O. Hartig, chameleon-db: a workload-aware robust RDF data management system. Technical Report CS-2013-10, University of Waterloo, 2013
Google Scholar
A. Aranda-Andújar, F. Bugiotti, J. Camacho-Rodríguez, D. Colazzo, F. Goasdoué, Z. Kaoudi, I. Manolescu, AMADA: web data repositories in the Amazon cloud, in 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, 29 October–02 November 2012, pp. 2749–2751
Google Scholar
M. Armbrust, R.S. Xin, C. Lian, Y. Huai, D. Liu, J.K. Bradley, X. Meng, T. Kaftan, M.J. Franklin, A. Ghodsi, M. Zaharia, Spark SQL: relational data processing in spark, in SIGMOD (2015)
Google Scholar
C. Başca, A. Bernstein, Querying a messy web of data with Avalanche. Web Semant. Sci. Serv. Agents World Wide Web 26, 1–28 (2014)
Google Scholar
A.Z. Broder, M. Charikar, A.M. Frieze, M. Mitzenmacher, Min-wise independent permutations, in Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (ACM, New York, 1998), pp. 327–336
Google Scholar
A. Charalambidis, A. Troumpoukis, S. Konstantopoulos, SemaGrow: optimizing federated SPARQL queries, in Proceedings of the 11th International Conference on Semantic Systems (ACM, New York, 2015), pp. 121–128
Google Scholar
X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in Proceedings of the ISWC 2014 Posters and Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC, Riva del Garda, 21 October 2014, pp. 261–264
Google Scholar
X. Chen, H. Chen, N. Zhang, S. Zhang, SparkRDF: elastic discreted RDF graph processing engine with distributed memory, in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT, vol. I, Singapore, 6–9 December 2015, pp. 292–300
Google Scholar
L. Cheng, S. Kotoulas, Scale-out processing of large RDF datasets. IEEE Trans. Big Data 1(4), 138–150 (2015)
Google Scholar
D. Collarana, C. Lange, S. Auer, FuhSen: a platform for federated, RDF-based hybrid search, in Proceedings of the 25th International Conference Companion on World Wide Web (International World Wide Web Conferences Steering Committee, Geneva, 2016), pp. 171–174
Google Scholar
A. Deshpande, Z. Ives, V. Raman et al., Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007)
Google Scholar
B. Djahandideh, F. Goasdoué, Z. Kaoudi, I. Manolescu, J.-A. Quiané-Ruiz, S. Zampetakis, Cliquesquare in action: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE, Seoul, 13–17 April 2015, pp. 1432–1435
Google Scholar
J. Feng, X. Zhang, Z. Feng, MapSQ: a MapReduce-based framework for SPARQL queries on GPU. Preprint (2017). arXiv:1702.03484
Google Scholar
L. Galárraga, K. Hose, R. Schenkel, Partout: a distributed engine for efficient RDF processing, in 23rd International World Wide Web Conference, WWW ’14, Companion Volume, Seoul, 7–11 April 2014, pp. 267–268
Google Scholar
F. Goasdoué, Z. Kaoudi, I. Manolescu, J.-A. Quiané-Ruiz, S. Zampetakis, Cliquesquare: flat plans for massively parallel RDF queries, in 31st IEEE International Conference on Data Engineering, ICDE, Seoul, 13–17 April 2015, pp. 771–782
Google Scholar
J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, I. Stoica, GraphX: graph processing in a distributed dataflow framework, in OSDI (2014)
Google Scholar
E.L. Goodman, D. Grunwald, Using vertex-centric programming platforms to implement SPARQL queries on large graphs, in Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms, IA3 ’14 (IEEE Press, Piscataway, 2014), pp. 25–32
Google Scholar
O. Görlitz, S. Staab, Splendid: SPARQL endpoint federation exploiting void descriptions, in Proceedings of the Second International Conference on Consuming Linked Data, vol. 782 (2011), pp. 13–24. CEUR-WS.org
D. Graux, L. Jachiet, P. Genevès, N. Layaïda, SPARQLGX: efficient distributed evaluation of SPARQL with Apache Spark, in International Semantic Web Conference (Springer, Berlin, 2016), pp. 80–87
Google Scholar
S. Gurajada, S. Seufert, I. Miliaraki, M. Theobald, Triad: a distributed shared-nothing RDF engine based on asynchronous message passing, in International Conference on Management of Data, SIGMOD, Snowbird, 22–27 June 2014, pp. 289–300
Google Scholar
L. Haas, D. Kossmann, E. Wimmers, J. Yang, Optimizing queries across diverse data sources (1997)
Google Scholar
M. Hammoud, D.A. Rabbou, R. Nouri, S.-M.-R. Beheshti, S. Sakr, DREAM: distributed RDF engine with adaptive query planner and minimal communication. Proc. VLDB Endow. 8(6), 654–665 (2015)
Google Scholar
R. Harbi, I. Abdelaziz, P. Kalnis, N. Mamoulis, Evaluating SPARQL queries on massive RDF datasets. Proc. VLDB Endow. 8(12), 1848–1851 (2015)
Google Scholar
A. Hasan, M. Hammoud, R. Nouri, S. Sakr, DREAM in action: a distributed and adaptive RDF system on the cloud, in Proceedings of the 25th International Conference on World Wide Web, WWW, Companion Volume, Montreal, 11–15 April 2016, pp. 191–194
Google Scholar
A. Hasnain, S. Decker, H. Deus, Cataloguing and linking life sciences LOD cloud. Research Day 2013 Schedule (2012), p. 41
Google Scholar
A. Hasnain, S.S. e Zainab, M.R. Kamdar, Q. Mehmood, C.N. Warren Jr., Q.A. Fatimah, H.F. Deus, M. Mehdi, S. Decker, A roadmap for navigating the life sciences linked open data cloud, in Joint International Semantic Technology Conference (Springer, Berlin, 2014), pp. 97–112
Google Scholar
K. Hose, R. Schenkel, WARP: workload-aware replication and partitioning for RDF, in DESWEB (2013)
Google Scholar
J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. Proc. VLDB Endow. 4(11), 1123–1134 (2011)
Google Scholar
N.D. Jones, An introduction to partial evaluation. ACM Comput. Surv. 28(3), 480–503 (1996)
Google Scholar
V. Khadilkar, M. Kantarcioglu, B.M. Thuraisingham, P. Castagna, Jena-HBase: a distributed, scalable and efficient RDF triple store, in Proceedings of the ISWC 2012 Posters & Demonstrations Track, Boston, 11–15 November 2012
Google Scholar
Y. Khan, M. Saleem, A. Iqbal, M. Mehdi, A. Hogan, A.-C. Ngonga Ngomo, S. Decker, R. Sahay, Safe: policy aware SPARQL query federation over RDF data cubes, in Proceedings of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences, Berlin, 9–11 December 2014
Google Scholar
H. Kim, P. Ravindra, K. Anyanwu, From SPARQL to mapreduce: the journey using a nested triplegroup algebra. Proc. VLDB Endow. 4(12), 1426–1429 (2011)
Google Scholar
H. Kim, P. Ravindra, K. Anyanwu, Optimizing RDF(S) queries on cloud platforms, in 22nd International World Wide Web Conference, WWW ’13, Companion Volume, Rio de Janeiro, 13–17 May 2013, pp. 261–264
Google Scholar
G. Ladwig, A. Harth, Cumulusrdf: linked data management on nested key-value stores, in SSWS (2011)
Google Scholar
G. Ladwig, T. Tran, SIHJoin: querying remote and local linked data, in The Semantic Web: Research and Applications (Springer, Berlin, 2011), pp. 139–153
Google Scholar
Q. Li, M. Shao, V. Markl, K. Beyer, L. Colby, G. Lohman, Adaptively reordering joins during query execution, in IEEE 23rd International Conference on Data Engineering, 2007. ICDE (IEEE, Piscataway, 2007), pp. 26–35
Google Scholar
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J.M. Hellerstein, Distributed GraphLab: a framework for machine learning in the cloud. Proc. VLDB Endow. 5(8) (2012)
Google Scholar
S. Lynden, I. Kojima, A. Matono, Y. Tanimura, ADERIS: an adaptive query processor for joining federated SPARQL endpoints, in On the Move to Meaningful Internet Systems: OTM (Springer, Berlin, 2011), pp. 808–817
Google Scholar
M.A. Martínez-Prieto, M. Arias, J.D. Fernandez, Exchange and consumption of huge RDF data, in The Semantic Web: Research and Applications (Springer, Berlin, 2012), pp. 437–452
Google Scholar
G. Montoya, H. Skaf-Molli, P. Molli, M.-E. Vidal, Federated SPARQL queries processing with replicated fragments, in International Semantic Web Conference (Springer, Berlin, 2015), pp. 36–51
Google Scholar
R. Mutharaju, S. Sakr, A. Sala, P. Hitzler, D-SPARQ: distributed, scalable and efficient RDF query engine, in Proceedings of the ISWC 2013 Posters & Demonstrations Track, Sydney, 23 October 2013, pp. 261–264
Google Scholar
H. Naacke, O. Curé, B. Amann, SPARQL query processing with Apache Spark. Preprint (2016). arXiv:1604.08903
Google Scholar
A. Nikolov, A. Schwarte, C. Hütter, FedSearch: efficiently combining structured queries and full-text search in a SPARQL federation, in International Semantic Web Conference (1) (2013), pp. 427–443
Google Scholar
D. Oguz, B. Ergenc, S. Yin, O. Dikenelli, A. Hameurlain, Federated query processing on linked data: a qualitative survey and open challenges. Knowl. Eng. Rev. 30(5), 545–563 (2015)
Google Scholar
C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig Latin: a not-so-foreign language for data processing, in Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, 10–12 June 2008, pp. 1099–1110
Google Scholar
N. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, N. Koziris, H2rdf+: high-performance distributed joins over large-scale RDF graphs, in 2013 IEEE International Conference on Big Data (IEEE, Piscataway, 2013), pp. 255–263
Google Scholar
N. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, N. Koziris, H2RDF+: high-performance distributed joins over large-scale RDF graphs, in Proceedings of the 2013 IEEE International Conference on Big Data, Santa Clara, 6–9 October 2013, pp. 255–263
Google Scholar
N. Papailiou, D. Tsoumakos, I. Konstantinou, P. Karras, N. Koziris, H₂rdf+: an efficient data management system for big RDF graphs, in International Conference on Management of Data, SIGMOD, Snowbird, 22–27 June 2014, pp. 909–912
Google Scholar
P. Peng, L. Zou, M. Tamer Özsu, L. Chen, D. Zhao, Processing SPARQL queries over distributed RDF graphs. VLDB J. 25(2), 243–268 (2016)
Google Scholar
A. Potter, B. Motik, Y. Nenov, I. Horrocks, Distributed RDF query answering with dynamic data exchange, in International Semantic Web Conference (Springer, Berlin, 2016), pp. 480–497
Google Scholar
R. Punnoose, A. Crainiceanu, D. Rapp, SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)
Google Scholar
B. Quilitz, U. Leser, Querying distributed RDF data sources with SPARQL, in European Semantic Web Conference (Springer, Berlin, 2008), pp. 524–538
Google Scholar
N.A. Rakhmawati, J. Umbrich, M. Karnstedt, A. Hasnain, M. Hausenblas, Querying over federated SPARQL endpoints—a state of the art survey. Preprint (2013). arXiv:1306.1723
Google Scholar
L. Raschid, S.Y.W. Su, A parallel processing strategy for evaluating recursive queries, in VLDB, vol. 86 (1986), pp. 412–419
Google Scholar
P. Ravindra, V.V. Deshpande, K. Anyanwu, Towards scalable RDF graph analytics on mapreduce, in Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud (ACM, New York, 2010), p. 5
Google Scholar
P. Ravindra, H. Kim, K. Anyanwu, An intermediate algebra for optimizing RDF graph pattern matching on mapreduce, in The Semanic Web: Research and Applications – 8th Extended Semantic Web Conference, ESWC, Proceedings, Part II, Heraklion, 29 May–2 June 2011, pp. 46–61
Google Scholar
K. Rohloff, R.E. Schantz, High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store, in SPLASH Workshop on Programming Support Innovations for Emerging Distributed Applications, Reno/Tahoe, 17 October 2010, p. 4
Google Scholar
K. Rohloff, R.E. Schantz, Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store, in DIDC’11, Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing, San Jose, 8 June 2011, pp. 35–44
Google Scholar
M. Saleem, A.-C. Ngonga Ngomo, J.X. Parreira, H.F. Deus, M. Hauswirth, DAW: Duplicate-AWare federated query processing over the web of data, in International Semantic Web Conference (Springer, Berlin, 2013), pp. 574–590
Google Scholar
M. Saleem, S.S. Padmanabhuni, A.-C. Ngonga Ngomo, A. Iqbal, J.S. Almeida, S. Decker, H.F. Deus, Topfed: Tcga tailored federated query processing and linking to lod. J. Biomed. Semant. 5(1), 47 (2014)
Google Scholar
M. Saleem, Y. Khan, A. Hasnain, I. Ermilov, A.-C. Ngonga Ngomo, A fine-grained evaluation of SPARQL endpoint federation systems. Semantic Web 7(5), 493–518 (2016)
Google Scholar
A. Schätzle, M. Przyjaciel-Zablocki, T. Hornung, G. Lausen, Pigsparql: a SPARQL query processing baseline for big data, in Proceedings of the ISWC 2013 Posters and Demonstrations Track, Sydney, 23 October 2013, pp. 241–244
Google Scholar
A. Schätzle, M. Przyjaciel-Zablocki, S. Skilevic, G. Lausen, S2RDF: RDF querying with SPARQL on spark. CoRR (2015). https://arxiv.org/abs/1512.07021
A. Schätzle, M. Przyjaciel-Zablocki, T. Berberich, G. Lausen, S2X: graph-parallel querying of RDF with GraphX, in 1st International Workshop on Big-Graphs Online Querying (Big-O(Q)) (2015)
Google Scholar
A. Schwarte, P. Haase, K. Hose, R. Schenkel, M. Schmidt, FedX: optimization techniques for federated query processing on linked data, in The Semantic Web – ISWC (2011), pp. 601–616
Google Scholar
B. Shao, H. Wang, Y. Li, Trinity: a distributed graph engine on a memory cloud, in Proceedings of the 2013 International Conference on Management of Data (ACM, New York, 2013), pp. 505–516
Google Scholar
J. Shi, Y. Yao, R. Chen, H. Chen, F. Li, Fast and concurrent RDF queries with RDMA-based distributed graph exploration, in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (USENIX Association, Berkeley, 2016), pp. 317–332
Google Scholar
M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, D. Reynolds, SPARQL basic graph pattern optimization using selectivity estimation, in Proceedings of the 17th International Conference on World Wide Web (ACM, New York, 2008), pp. 595–604
Google Scholar
P. Stutz, A. Bernstein, W. Cohen, Signal/collect: graph algorithms for the (semantic) web, in International Semantic Web Conference (Springer, Berlin, 2010), pp. 764–780
Google Scholar
P. Stutz, M. Verman, L. Fischer, A. Bernstein, Triplerush: a fast and scalable triple store, in Proceedings of the 9th International Conference on Scalable Semantic Web Knowledge Base Systems, vol. 1046 (2013), pp. 50–65. CEUR-WS.org
B. Thompson, M. Personick, M. Cutcher, The bigdata®; RDF graph database, in Linked Data Management (Chapman and Hall/CRC, Boca Raton, 2014), pp. 193–237
Google Scholar
T. Urhan, M.J. Franklin, XJoin: a reactively-scheduled pipelined join operator, in Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2000), p. 27
Google Scholar
P. Valduriez, Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987)
Google Scholar
L.G. Valiant, A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Google Scholar
X. Wang, T. Tiropanis, H.C. Davis, LHD: optimising linked data query processing using parallelisation (2013)
Google Scholar
X. Wang, J. Wang, X. Zhang, Efficient distributed regular path queries on RDF graphs using partial evaluation, in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (ACM, New York, 2016), pp. 1933–1936
Google Scholar
G. Wiederhold, Mediators in the architecture of future information systems. Computer 25(3), 38–49 (1992)
Article Google Scholar
B. Wu, Y. Zhou, P. Yuan, H. Jin, L. Liu, SemStore: a semantic-preserving distributed RDF triple store, in CIKM (2014), pp. 509–518
Google Scholar
M. Wylot, P. Cudré-Mauroux, DiploCloud: efficient and scalable management of RDF data in the cloud. IEEE Trans. Knowl. Data Eng. 28(3), 659–674 (2016)
Article Google Scholar
M. Wylot, J. Pont, M. Wisniewski, P. Cudré-Mauroux, dipLODocus[RDF]: short and long-tail RDF analytics for massive webs of data, in Proceedings of the 10th International Conference on The Semantic Web (ISWC’11), Volume Part I (Springer, Berlin, 2011), pp. 778–793
Google Scholar
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in HotCloud (2010)
Google Scholar
K. Zeng, J. Yang, H. Wang, B. Shao, Z. Wang, A distributed graph engine for web scale RDF data, in Proceedings of the 39th International Conference on Very Large Data Bases, VLDB Endowment (2013), pp. 265–276
Google Scholar
X. Zhang, L. Chen, Y. Tong, M. Wang, EAGRE: towards scalable I/O efficient SPARQL query evaluation on the cloud, in 29th IEEE International Conference on Data Engineering, ICDE, Brisbane, 8–12 April 2013, pp. 565–576
Google Scholar
L. Zou, M. Tamer Özsu, L. Chen, X. Shen, R. Huang, D. Zhao, gStore: a graph-based SPARQL query engine. VLDB J. 23(4), 565–590 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Public Health & Health Informatics, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
Sherif Sakr
Fakultät IV - Open Distributed Systems, Technische Universität Berlin, Berlin, Germany
Marcin Wylot & Danh Le Phuoc
Knowledge Discovery Lab, GE Global Research, Niskayuna, New York, USA
Raghava Mutharaju
Foundation for Research and Technology - Hellas (FORTH), Institute of Computer Science (ICS), Heraklion, Greece
Irini Fundulaki

Authors

Sherif Sakr
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Wylot
View author publications
You can also search for this author in PubMed Google Scholar
Raghava Mutharaju
View author publications
You can also search for this author in PubMed Google Scholar
Danh Le Phuoc
View author publications
You can also search for this author in PubMed Google Scholar
Irini Fundulaki
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sakr, S., Wylot, M., Mutharaju, R., Le Phuoc, D., Fundulaki, I. (2018). Distributed RDF Query Processing. In: Linked Data. Springer, Cham. https://doi.org/10.1007/978-3-319-73515-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-73515-3_4
Published: 02 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73514-6
Online ISBN: 978-3-319-73515-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics