research-article

An empirical evaluation of cost-based federated SPARQL query processing engines

Editor: Ruben Verborgh Authors: Umair Qudus, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo, Young-Koo LeeAuthors Info & Claims

Semantic Web, Volume 12, Issue 6

Pages 843 - 868

https://doi.org/10.3233/SW-200420

Published: 01 January 2021 Publication History

Abstract

Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines across different performance metrics, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful for the development of future cost-based federated SPARQL query processing engines.

References

[1]

I. Abdelaziz, E. Mansour, M. Ouzzani, A. Aboulnaga and P.K. Lusail, A system for querying linked data at scale, Proc. VLDB Endow. 11(4) (2017), 485–498.

Digital Library

[2]

M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo and E. Ruckhaus, ANAPSID: An adaptive query processing engine for SPARQL endpoints, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 18–34.

[3]

M. Acosta, M.-E. Vidal and Y. Sure-Vetter, Diefficiency metrics: Measuring the continuous efficiency of query processing approaches, in: The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudré-Mauroux, J. Sequeda, C. Lange and J. Heflin, eds, Springer-Verlag Berlin Heidelberg, Cham, 2017, pp. 3–19.

Digital Library

[4]

N. Aini Rakhmawati, M. Saleem, S. Lalithsena and S. Decker, QFed: Query set for federated SPARQL query benchmark, in: Proceedings of the 16th International Conference on Information Integration and Web-Based Applications & Services, iiWAS ’14, ACM, New York, NY, USA, 2014, pp. 207–211.

Digital Library

[5]

N. Aini Rakhmawati, J. Umbrich, M. Karnstedt, A. Hasnain and M. Hausenblas, Querying over federated SPARQL endpoints – A state of the art survey. CoRR, 2013,.

[6]

K. Alexander, R. Cyganiak, M. Hausenblas and J. Zhao, Describing linked datasets – On the design and usage of void, the vocabulary of interlinked datasets, in: Linked Data on the Web Workshop (LDOW 09), in Conjunction with 18th International World Wide Web Conference (WWW 09), Vol. 538, 2010.

[7]

C. Bizer and A. Schultz, The Berlin SPARQL benchmark, International Journal on Semantic Web and Information Systems (IJSWIS) 5 (2009), 1–24.

[8]

C. Buil-Aranda, A. Hogan, J. Umbrich and P.-Y. Vandenbussche, SPARQL web-querying infrastructure: Ready for action? in: The Semantic Web – ISWC 2013, H. Alani, L. Kagal, A. Fokoue, P. Groth, C. Biemann, J.X. Parreira, L. Aroyo, N. Noy, C. Welty and K. Janowicz, eds, Springer, Berlin, Heidelberg, 2013, pp. 277–293.

Digital Library

[9]

A. Charalambidis, A. Troumpoukis and S.K. Semagrow, Optimizing federated SPARQL queries, in: Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS ’15, ACM, New York, NY, USA, 2015, pp. 121–128.

Digital Library

[10]

F. Conrads, J. Lehmann, M. Saleem, M. Morsey and A.-C. Ngonga Ngomo, I guana: A generic framework for benchmarking the read-write performance of triple stores, in: International Semantic Web Conference, Springer, Cham, 2017, pp. 48–65.

Digital Library

[11]

F. Du, Y. Chen and X. Du, Partitioned indexes for entity search over RDF knowledge bases, in: Proceedings of the 17th International Conference on Database Systems for Advanced Applications – Volume Part I, DASFAA ’12, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 141–155.

Digital Library

[12]

K.M. Endris, M. Galkin, I. Lytra, M.N. Mami, M.-E. Vidal and S. Auer, MULDER: Querying the linked data web by bridging RDF molecule templates, in: Database and Expert Systems Applications (DEXA ’17), D. Benslimane, E. Damiani, W.I. Grosky, A. Hameurlain, A. Sheth and R.R. Wagner, eds, Vol. 8, Springer, Cham, 2017, pp. 3–18.

[13]

O. Görlitz and S. Staab, SPLENDID: SPARQL endpoint federation exploiting VOID descriptions, in: Proceedings of the Second International Conference on Consuming Linked Data (COLD ’11), Vol. 782, CEUR-WS.org, Aachen, Germany, 2010, pp. 13–24.

[14]

O. Görlitz, M. Thimm and S. Staab, SPLODGE: Systematic generation of SPARQL benchmark queries for linked open data, in: Proceedings of the 11th International Conference on the Semantic Web – Part I, the Semantic Web – ISWC ’12, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 116–132.

Digital Library

[15]

A. Gubichev and T. Neumann, Exploiting the query structure for efficient join ordering in SPARQL queries, in: EDBT, Vol. 14, 2014, pp. 439–450.

[16]

O. Hartig, C. Bizer and J.-C. Freytag, Executing SPARQL queries over the web of linked data, in: Proceedings of the 8th International Semantic Web Conference, ISWC ’09, Springer-Verlag, Berlin, Heidelberg, 2009, pp. 293–309.

Digital Library

[17]

A. Hasnain, R. Fox, S. Decker and H.F. Deus, Cataloguing and linking life sciences LOD cloud, in: 1st International Workshop on Ontology Engineering in a Data-Driven World (OEDW 2012) Collocated with 8th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2012), 2012, pp. 114–130.

[18]

A. Hasnain, Q. Mehmood, S. Sana, E. Zainab, M. Saleem, C. Warren, Jr., D. Zehra, S. Decker and D. Rebholz-Schuhman BioFed: Federated query processing over life sciences linked open data, Journal of Biomedical Semantics 8(1) (2017), 13.

[19]

A. Hasnain, M. Saleem, A.-C. Ngonga Ngomo and D. Rebholz-Schuhmann, Extending LargeRDFBench for multi-source data at scale for SPARQL endpoint federation, in: Proceedings of the 12th International Workshop on Scalable Semantic Web Knowledge Base Systems Co-Located with 17th International Semantic Web Conference, SSWS@ISWC 2018, Monterey, California, USA, October 9, 2018, Vol. 2179, 2018, pp. 28–44.

[20]

A. Hasnain, S. Sana e Zainab, M.R. Kamdar, Q. Mehmood, C.N. Warren, Jr., Q.A. Fatimah, H.F. Deus, M. Mehdi and S. Decker, A roadmap for navigating the life sciences linked open data cloud, in: Semantic Technology, T. Supnithi, T. Yamaguchi, J.Z. Pan, V. Wuwongse and M. Buranarach, eds, Lecture Notes in Computer Science, Vol. 8943, Springer International Publishing, 2015, pp. 97–112.

[21]

P.W. Holland and R.E. Welsch, Robust regression using iteratively reweighted least-squares, Communications in Statistics – Theory and Methods 6(9) (1977), 813–827.

[22]

P.J. Huber, Robust Estimation of a Location Parameter, Springer, New York, New York, NY, 1992, pp. 492–518.

[23]

Y. Khan, M. Saleem, A. Iqbal, M. Mehdi, A. Hogan, A.-C. Ngonga Ngomo, S. Decker and R. Sahay, SAFE: Policy aware SPARQL query federation over RDF data cubes, in: Proceedings of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences, Berlin, Germany, December 9–11, 2014, 2014.

[24]

Y. Khan, M. Saleem, M. Mehdi, A. Hogan, Q. Mehmood, D. Rebholz-Schuhmann and R. Sahay, SAFE: SPARQL federation over RDF data cubes with access control, Journal of biomedical semantics 8(1) (2017), 5.

[25]

D. Kossmann, The state of the art in distributed query processing, ACM Comput. Surv. 32(4) (2000), 422–469.

Digital Library

[26]

G. Ladwig and T. Tran, SIHJoin: Querying remote and local linked data, in: The Semantic Web: Research and Applications, G. Antoniou, M. Grobelnik, E. Simperl, B. Parsia, D. Plexousakis, P. De Leenheer and J. Pan, eds, Lecture Notes in Computer Science, Vol. 6643, Springer, Berlin, Heidelberg, 2011, pp. 139–153.

[27]

V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper and T. Neumann, How good are query optimizers, really? Proc. VLDB Endow. 9(3) (2015), 204–215.

Digital Library

[28]

S. Lynden, I. Kojima, A. Matono and Y. Tanimura, ADERIS: An adaptive query processor for joining federated SPARQL endpoints, in: On the Move to Meaningful Internet Systems (OTM2011), Part II, R. Meersman, T. Dillon, P. Herrero, A. Kumar, M. Reichert, L. Qing, B.-C. Ooi, E. Damiani, D.C. Schmidt, J. White, M. Hauswirth, P. Hitzler and M. Mohania, eds, LNCS, Vol. 7045, Springer, Heidelberg, 2011, pp. 808–817.

[29]

G. Moerkotte, T. Neumann and G. Steidl, Preventing bad plans by bounding the impact of cardinality estimation errors, Proc. VLDB Endow. 2(1) (2009), 982–993.

Digital Library

[30]

G. Montoya, H. Skaf-Molli and K. Hose, The odyssey approach for optimizing federated SPARQL queries, The Semantic Web – ISWC 2017 (2017), 471–489.

Digital Library

[31]

G. Montoya, M.-E. Vidal and M. Acosta, A heuristic-based approach for planning federated SPARQL queries, in: Proceedings of the Third International Conference on Consuming Linked Data (COLD ’12), Vol. 905, CEUR-WS.org, Aachen, Germany, 2012, pp. 63–74.

Digital Library

[32]

G. Montoya, M.-E. Vidal, Ó. Corcho, E. Ruckhaus and C.B. Aranda, Benchmarking federated SPARQL query engines: Are existing testbeds enough? in: Proceedings, Part II, The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Boston, MA, USA, November 11–15, 2012, Proceedings, Part II, P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J.X. Parreira, J. Hendler, G. Schreiber, A. Bernstein and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7650, Springer, 2012, pp. 313–324.

Digital Library

[33]

M. Morsey, J. Lehmann, S. Auer and A.-C. Ngonga Ngonga, DBpedia SPARQL benchmark: Performance assessment with real queries on real data, in: Proceedings of the 10th International Conference on the Semantic Web – Part I, the Semantic Web – ISWC ’11, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 454–469.

[34]

T. Neumann and G. Moerkotte, Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins, in: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, IEEE Computer Society, IEEE, 2011, pp. 984–994.

Digital Library

[35]

D.P. O’Leary, Robust regression computation using iteratively reweighted least squares, SIAM J. Matrix Anal. Appl. 11(3) (1990), 466–480.

Digital Library

[36]

B. Quilitz and U. Leser, Querying distributed RDF data sources with SPARQL, in: Proceedings of the 5th European Semantic Web Conference on the Semantic Web: Research and Applications, ESWC ’08, Springer-Verlag, Berlin, Heidelberg, 2008, pp. 524–538.

[37]

P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, Vol. 589, 1st edn, John Wiley & Sons, Inc., New York, NY, USA, 1987.

[38]

M. Saleem, A. Hasnain and A.-C. Ngonga Ngomo, LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation, Journal of Web Semantics 48 (2018), 85–125.

[39]

M. Saleem, Y. Khan, A. Hasnain, I. Ermilov and A.-C. Ngonga Ngomo, A fine-grained evaluation of SPARQL endpoint federation systems, Semantic Web Journal 7(5) (2016), 493–518.

Digital Library

[40]

M. Saleem and A.-C. Ngonga Ngomo, HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation, in: The Semantic Web: Trends and Challenges, V. Presutti, C. d’Amato, F. Gandon, M. d’Aquin, S. Staab and A. Tordai, eds, Lecture Notes in Computer Science, Vol. 8465, Springer International Publishing, 2014, pp. 176–191.

[41]

M. Saleem, A.-C. Ngonga Ngomo, J.X. Parreira, H.F. Deus and M. Hauswirth, DAW: Duplicate-AWare federated query processing over the web of data, in: Proceedings of the 12th International Semantic Web Conference – Part I, Lecture Notes in Computer Science, Springer-Verlag, New York, NY, USA, 2013, pp. 574–590.

Digital Library

[42]

M. Saleem, S.S. Padmanabhuni, A.-C. Ngonga Ngomo, A. Iqbal, J.S. Almeida, S. Decker and H.F. Deus, TopFed: TCGA tailored federated query processing and linking to LOD, J. Biomed. Semant. 5 (2014), 47.

[43]

M. Saleem, A. Potocki, T. Soru, O. Hartig and A.-C. Ngonga Ngomo, CostFed: Cost-based query optimization for SPARQL endpoint federation, in: Proceedings of the 14th International Conference on Semantic Systems, Vol. 137, Elsevier, 2018, pp. 163–174.

[44]

M. Saleem, G. Szárnyas, F. Conrads, S.A.C. Bukhari, Q. Mehmood and A.-C. Ngonga Ngomo, How representative is a SPARQL benchmark? An analysis of RDF triplestore benchmarks, in: The World Wide Web Conference, WWW ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 1623–1633.

Digital Library

[45]

M. Schmidt, O. Görlitz, P. Haase, G. Ladwig, A. Schwarte and T. Tran, FedBench: A benchmark suite for federated semantic data query processing, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 585–600.

[46]

M. Schmidt, T. Hornung, G. Lausen and C. Pinkel, SP 2Bench: A SPARQL performance benchmark, in: Proceedings of the 25th International Conference on Data Engineering ICDE, IEEE, 2009, pp. 222–233.

Digital Library

[47]

A. Schwarte, P. Haase, K. Hose, R. Schenkel and M.S. FedX, Optimization techniques for federated query processing on linked data, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 601–616.

[48]

J. Umbrich, A. Hogan, A. Polleres and S. Decker, Link traversal querying for a diverse web of data, Semantic Web Journal 6(6) (2015), 585–624.

[49]

X. Wang, T. Tiropanis and H. Davis, LHD optimising: Linked data query processing using parallelisation, in: Workshop on Linked Data on the Web (LDOW ’13), Proceedings of the WWW 2013, CEUR Workshop Proceedings, Vol. 996, CEUR-WS.org, Rio de Janeiro, Brazil, 2013.

[50]

M. Wylot, M. Hauswirth, P. Cudré-Mauroux and S. Sakr, RDF data storage and query processing schemes: A survey, ACM Comput. Surv. 51(4) (2018), 84:1–84:36.

Digital Library

Cited By

Sultana THossain MMorshed MAfridi TLee Y(2024)Inductive autoencoder for efficiently compressing RDF graphsInformation Sciences: an International Journal10.1016/j.ins.2024.120210662:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120210
Yang FCrainiceanu AChen ZNeedham D(2023)Cluster-Based Joins for Federated SPARQL QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313550735:4(3525-3539)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TKDE.2021.3135507

Index Terms

An empirical evaluation of cost-based federated SPARQL query processing engines

Index terms have been assigned to the content through auto-classification.

Recommendations

gTop: An Efficient SPARQL Query Engine
Web and Big Data
Abstract
In this demonstration, we present gTop, a top-k query engine based on gStore which supports SPARQL queries over RDF databases. gTop can answer top-k queries with high efficiency and scalability. We use the DP-B algorithm for top-k queries and the ...
Towards a top-K SPARQL query benchmark
ISWC-PD'14: Proceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 1272

The research on optimization of top-k SPARQL query would largely benefit from the establishment of a benchmark that allows comparing different approaches. For such a benchmark to be meaningful, at least two requirements should hold: 1) the benchmark ...
Solving the SPARQL query containment problem with SpeCS
Abstract
The query containment problem is a fundamental computer science problem which was originally defined for relational queries. With the growing popularity of the sparql query language, it became relevant and important in this new context:...
Highlights
- Implementation and evaluation of open source sparql query containment solver SpeCS

Comments

Information & Contributors

Information

Published In

cover image Semantic Web

Semantic Web Volume 12, Issue 6

Storing, Querying, and Benchmarking the Web of Data

2021

115 pages

ISSN:1570-0844

EISSN:2210-4968

Issue’s Table of Contents

© 2021 – IOS Press. All rights reserved.

Publisher

IOS Press

Netherlands

Publication History

Published: 01 January 2021

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sultana THossain MMorshed MAfridi TLee Y(2024)Inductive autoencoder for efficiently compressing RDF graphsInformation Sciences: an International Journal10.1016/j.ins.2024.120210662:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120210
Yang FCrainiceanu AChen ZNeedham D(2023)Cluster-Based Joins for Federated SPARQL QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313550735:4(3525-3539)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TKDE.2021.3135507

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents