research-article

A fine-grained evaluation of SPARQL endpoint federation systems

Editor: Axel Polleres Authors: Muhammad Saleem, Yasar Khan, Ali Hasnain, Ivan Ermilov, Axel-Cyrille Ngonga NgomoAuthors Info & Claims

Semantic Web, Volume 7, Issue 5

Pages 493 - 518

https://doi.org/10.3233/SW-150186

Published: 01 January 2016 Publication History

Abstract

The Web of Data has grown enormously over the last years. Currently, it comprises a large compendium of interlinked and distributed datasets from multiple domains. Running complex queries on this compendium often requires accessing data from different endpoints within one query. The abundance of datasets and the need for running complex query has thus motivated a considerable body of work on SPARQL query federation systems, the dedicated means to access data distributed over the Web of Data. However, the granularity of previous evaluations of such systems has not allowed deriving of insights concerning their behavior in different steps involved during federated query processing. In this work, we perform extensive experiments to compare state-of-the-art SPARQL endpoint federation systems using the comprehensive performance evaluation framework FedBench. In addition to considering the tradition query runtime as an evaluation criterion, we extend the scope of our performance evaluation by considering criteria, which have not been paid much attention to in previous studies. In particular, we consider the number of sources selected, the total number of SPARQL ASK requests used, the completeness of answers as well as the source selection time. Yet, we show that they have a significant impact on the overall query runtime of existing systems. Moreover, we extend FedBench to mirror a highly distributed data environment and assess the behavior of existing systems by using the same performance criteria. As the result we provide a detailed analysis of the experimental outcomes that reveal novel insights for improving current and future SPARQL federation systems.

References

[1]

M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo and E. Ruckhaus, ANAPSID: An adaptive query processing engine for SPARQL endpoints, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer, Berlin Heidelberg, 2011, pp. 18–34.

[2]

Z. Akar, T.G. Halaç, E.E. Ekinci and O. Dikenelli, Querying the Web of interlinked datasets using VoID descriptions, in: Linked Data on the Web (LDOW2012), C. Bizer et al., eds, CEUR Workshop Proceedings, Vol. 937, 2012.

[3]

F. Amorim, Join reordering and bushy plans, 2013, https://www.simple-talk.com/sql/performance/join-reordering-and-bushy-plans/, Accessed: June 16, 2014.

[4]

C. Basca and A. Bernstein, Avalanche: Putting the spirit of the Web Back into Semantic Web querying, in: 6th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2010), A. Fokoue, Y. Guo and T. Liebig, eds, CEUR Workshop Proceedings, Vol. 669, 2010, pp. 64–79.

[5]

H. Betz, F. Gropengießer, K. Hose and K.-U. Sattler, Learning from the history of distributed query processing: A heretic view on Linked Data management, in: 3rd International Workshop on Consuming Linked Data (COLD 2012), J.F. Sequeda, A. Harth and O. Hartig, eds, CEUR Workshop Proceedings Vol. 905, 2012.

[6]

C. Bizer and A. Schultz, The Berlin SPARQL benchmark, International Journal on Semantic Web and Information Systems (IJSWIS), 5 (2009), IGI Global, 1–24.

[7]

O. Görlitz and S. Staab, SPLENDID: SPARQL endpoint federation exploiting VoID descriptions, in: 2nd International Workshop on Consuming Linked Data (COLD 2011), O. Hartig, A. Harth and J.F. Sequeda, eds, CEUR Workshop Proceedings, Vol. 782, 2011.

[8]

O. Görlitz and S. Staab, Federated data management and query optimization for Linked Open Data, in: New Directions in Web Data Management 1, A. Vakali and L. Jain, eds, Studies in Computational Intelligence, Vol. 331, Springer, Berlin, Heidelberg, 2011, pp. 109–137.

[9]

Y. Guo, Z. Pan and J. Heflin, LUBM: A benchmark for OWL knowledge base systems, in: Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 3, Elsevier, 2005, pp. 158–182.

[10]

O. Hartig, An overview on execution strategies for Linked Data queries, in: Datenbank-Spektrum, Vol. 13, Springer, 2013, pp. 89–99.

[11]

A. Hasnain, R. Fox, S. Decker and H.F. Deus, Cataloguing and linking life sciences LOD cloud, in: 1st International Workshop on Ontology Engineering in a Data-Driven World (OEDW 2012) Collocated with 8th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2012), 2012.

[12]

A. Hasnain, M. Kamdar, P. Hasapis, D. Zeginis, J. Warren, N. Claude, H. Deus, D. Ntalaperas, K. Tarabanis, M. Mehdi and S. Decker, Linked biomedical dataspace: Lessons learned integrating data for drug discovery, in: The Semantic Web – ISWC 2014, P. Mika, T. Tudorache, A. Bernstein, C. Welty, C. Knoblock, D. Vrandečić, P. Groth, N. Noy, K. Janowicz and C. Goble, eds, Lecture Notes in Computer Science, Vol. 8796, Springer International Publishing, 2014, pp. 114–130.

[13]

A. Hasnain, S. Sana e Zainab, M. Kamdar, Q. Mehmood, J. Warren, N. Claude, Q. Fatimah, H. Deus, M. Mehdi and S. Decker, A roadmap for navigating the life sciences Linked Open Data Cloud, in: Semantic Technology, T. Supnithi, T. Yamaguchi, J.Z. Pan, V. Wuwongse and M. Buranarach, eds, Lecture Notes in Computer Science, Vol. 8943, Springer International Publishing, 2015, pp. 97–112.

[14]

Y.E. Ioannidis and Y.C. Kang, Left-deep vs. Bushy Trees: An analysis of strategy spaces and its implications for query optimization, in: Proc. of the 1991 ACM SIGMOD International Conference on Management of Data, SIGMOD’91, C. James and K. Roger, eds, ACM, New York, NY, USA, 1991, pp. 168–177.

[15]

M.R. Kamdar, D. Zeginis, A. Hasnain, S. Decker and H.F. Deus, ReVeaLD: A user-driven domain-specific interactive search platform for biomedical research, Journal of Biomedical Informatics 47 (2014), Elsevier, 112–130.

[16]

Z. Kaoudi, M. Koubarakis, K. Kyzirakos, I. Miliaraki, M. Magiridou and A. Papadakis-Pesaresi, Atlas: Storing, updating and querying RDF(S) data on top of DHTs, in: Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 8, Elsevier, 2010, pp. 271–277.

[17]

Y. Khan, M. Saleem, A. Iqbal, M. Mehdi, A. Hogan, P. Hasapis, A.-C.N. Ngomo, S. Decker and R. Sahay, SAFE: Policy aware SPARQL query federation over RDF data cubes, in: Proc. of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences, A. Paschke, A. Burger, P. Romano, M.S. Marshall and A. Splendiani, eds, CEUR Workshop Proceedings, Vol. 1320, December 2014.

[18]

G. Ladwig and T. Tran, Linked Data query processing strategies, in: The Semantic Web – ISWC 2010, P. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J. Pan, I. Horrocks and B. Glimm, eds, Lecture Notes in Computer Science, Vol. 6496, Springer, Berlin, Heidelberg, 2010, pp. 453–469.

[19]

G. Ladwig and T. Tran, SIHJoin: Querying remote and local Linked Data, in: The Semantic Web: Research and Applications, G. Antoniou, M. Grobelnik, E. Simperl, B. Parsia, D. Plexousakis, P. De Leenheer and J. Pan, eds, Lecture Notes in Computer Science, Vol. 6643, Springer, Berlin, Heidelberg, 2011, pp. 139–153.

[20]

S. Lynden, I. Kojima, A. Matono and Y. Tanimura, ADERIS: An adaptive query processor for joining federated SPARQL endpoints, in: On the Move to Meaningful Internet Systems (OTM2011), Part II, R. Meersman, T. Dillon, P. Herrero, A. Kumar, M. Reichert, L. Qing, B.-C. Ooi, E. Damiani, D.C. Schmidt, J. White, M. Hauswirth, P. Hitzler and M. Mohania, eds, LNCS, Vol. 7045, Springer, Heidelberg, 2011, pp. 808–817.

[21]

G. Montoya, M.-E. Vidal and M. Acosta, A heuristic-based approach for planning federated SPARQL queries, in: 3rd International Workshop on Consuming Linked Data (COLD 2012), J.F. Sequeda, A. Harth and O. Hartig, eds, CEUR Workshop Proceedings, Vol. 905, 2012.

[22]

G. Montoya, M.-E. Vidal, O. Corcho, E. Ruckhaus and C. Buil-Aranda, Benchmarking federated SPARQL query engines: Are existing testbeds enough? in: The Semantic Web – ISWC 2012, Part II, P. Cudre Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J.X. Parreira, J. Hendler, G. Schreiber, A. Bernstein and E. Blomqvist, eds, LNCS, Vol. 7650, Springer, Heidelberg, 2012, pp. 313–324.

[23]

M. Morsey, J. Lehmann, S. Auer and A.-C. Ngonga Ngomo, DBpedia SPARQL benchmark – Performance assessment with real queries on real data, in: International Semantic Web Conference (ISWC2011), Part I, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, LNCS, Vol. 7031, Springer, Heidelberg, 2011, pp. 454–469.

[24]

A. Nikolov, A. Schwarte and C. Hütter, Fedsearch: Efficiently combining structured queries and full-text search in a SPARQL federation, in: The Semantic Web – ISWC 2013, H. Alani, L. Kagal, A. Fokoue, P. Groth, C. Biemann, J. Parreira, L. Aroyo, N. Noy, C. Welty and K. Janowicz, eds, Lecture Notes in Computer Science, Vol. 8218, Springer, Berlin, Heidelberg, 2013, pp. 427–443.

[25]

B. Quilitz and U. Leser, Querying distributed RDF data sources with SPARQL, in: The Semantic Web: Research and Applications, S. Bechhofer, M. Hauswirth, J. Hoffmann and M. Koubarakis, eds, Lecture Notes in Computer Science, Vol. 5021, Springer, Berlin, Heidelberg, 2008, pp. 524–538.

[26]

N.A. Rakhmawati, J. Umbrich, M. Karnstedt, A. Hasnain and M. Hausenblas, Querying over federated SPARQL endpoints – A state of the art survey, CoRR, 2013.

[27]

M. Saleem, R. Maulik, I. Aftab, S. Shanmukha, H. Deus and A.-C. Ngonga Ngomo, Fostering serendipity through Big Linked Data, in: Semantic Web Challenge at International Semantic Web Conference, 2013.

[28]

M. Saleem and A.-C. Ngonga Ngomo, HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation, in: The Semantic Web: Trends and Challenges, V. Presutti, C. d’Amato, F. Gandon, M. d’Aquin, S. Staab and A. Tordai, eds, Lecture Notes in Computer Science, Vol. 8465, Springer International Publishing, 2014, pp. 176–191.

[29]

M. Saleem, A.-C. Ngonga Ngomo, J. Xavier Parreira, H. Deus and M. Hauswirth, DAW: Duplicate-AWare federated query processing over the Web of Data, in: The Semantic Web – ISWC 2013, H. Alani, L. Kagal, A. Fokoue, P. Groth, C. Biemann, J. Parreira, L. Aroyo, N. Noy, C. Welty and K. Janowicz, eds, Lecture Notes in Computer Science, Vol. 8218, Springer, Berlin, Heidelberg, 2013, pp. 574–590.

[30]

M. Saleem, S.S. Padmanabhuni, A.-C.N. Ngomo, J.S. Almeida, S. Decker and H.F. Deus, Linked cancer genome atlas database, in: Proc. of the 9th International Conference on Semantic Systems, M. Sabou, E. Blomqvist, T. Di Noia, H. Sack and T. Pellegrini, eds, ACM, New York, NY, USA, 2013, pp. 129–134.

[31]

M. Schmidt, O. Görlitz, P. Haase, G. Ladwig, A. Schwarte and T. Tran, FedBench: A benchmark suite for federated semantic data query processing, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer, Berlin, Heidelberg, 2011, pp. 585–600.

[32]

M. Schmidt, T. Hornung, G. Lausen and C. Pinkel, SP²Bench: A SPARQL performance benchmark, in: Proc. of the 25th International Conference on Data Engineering ICDE, IEEE, 2009, pp. 222–233.

[33]

A. Schwarte, P. Haase, K. Hose, R. Schenkel and M. Schmidt, FedX: Optimization techniques for federated query processing on Linked Data, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer, Berlin, Heidelberg, 2011, pp. 601–616.

[34]

A. Schwarte, P. Haase, M. Schmidt, K. Hose and R. Schenkel, An experience report of large scale federations, CoRR, 2012.

[35]

P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie and T.G. Price, Access path selection in a relational database management system, in: Proc. of the 1979 ACM SIGMOD International Conference on Management of Data, SIGMOD’79, ACM, New York, NY, USA, 1979, pp. 23–34.

[36]

J. Umbrich, A. Hogan, A. Polleres and S. Decker, Link traversal querying for a Diverse Web of Data, Semantic Web Journal (SWJ), IOS Press, 2014, accepted for publication.

[37]

J. Umbrich, K. Hose, M. Karnstedt, A. Harth and A. Polleres, Comparing data summaries for processing live queries over Linked Data, World Wide Web Journal 14 (2011), Springer US, 495–544.

Digital Library

[38]

X. Wang, T. Tiropanis and H.C. Davis, LHD: Optimising Linked Data query processing using parallelisation, in: Proc. of the WWW2013 Workshop on Linked Data on the Web, C. Bizer, T. Heath, T. Berners-Lee, M. Hausenblas and S. Auer, eds, CEUR Workshop Proceedings, Vol. 996, 2013.

Cited By

Masmoudi MBen Abdallah Ben Lamine SKarray MArchimede BBaazaoui Zghal H(2024)Semantic Data Integration and Querying: A Survey and ChallengesACM Computing Surveys10.1145/365331756:8(1-35)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3653317
Yang FCrainiceanu AChen ZNeedham D(2023)Cluster-Based Joins for Federated SPARQL QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313550735:4(3525-3539)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TKDE.2021.3135507
Dang MAimonier-Davat JMolli PHartig OSkaf-Molli HLe Crom Y(2023)FedShop: A Benchmark for Testing the Scalability of SPARQL Federation EnginesThe Semantic Web – ISWC 202310.1007/978-3-031-47243-5_16(285-301)Online publication date: 6-Nov-2023
https://dl.acm.org/doi/10.1007/978-3-031-47243-5_16
Show More Cited By

Index Terms

A fine-grained evaluation of SPARQL endpoint federation systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Foundations of SPARQL query optimization
ICDT '10: Proceedings of the 13th International Conference on Database Theory

We study fundamental aspects related to the efficient processing of the SPARQL query language for RDF, proposed by the W3C to encode machine-readable information in the Semantic Web. Our key contributions are (i) a complete complexity analysis for all ...
SPARQL-RW: transparent query access over mapped RDF data sources
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology

The Web of Data is an open environment consisting of very large, inter-linked RDF datasets from various domains (e.g., DBpedia, GeoNames, ACM, PubMed, etc.) accessed through SPARQL queries. Establishing interoperability in this environment has become a ...
Federating queries in SPARQL 1.1: Syntax, semantics and evaluation

Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across these has also grown. To address this use case, the W3C SPARQL working group is defining a ...

Comments

Information & Contributors

Information

Published In

cover image Semantic Web

Semantic Web Volume 7, Issue 5

2016

93 pages

ISSN:1570-0844

EISSN:2210-4968

Issue’s Table of Contents

IOS Press and the authors. All rights reserved.

Publisher

IOS Press

Netherlands

Publication History

Published: 01 January 2016

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Masmoudi MBen Abdallah Ben Lamine SKarray MArchimede BBaazaoui Zghal H(2024)Semantic Data Integration and Querying: A Survey and ChallengesACM Computing Surveys10.1145/365331756:8(1-35)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3653317
Yang FCrainiceanu AChen ZNeedham D(2023)Cluster-Based Joins for Federated SPARQL QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313550735:4(3525-3539)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TKDE.2021.3135507
Dang MAimonier-Davat JMolli PHartig OSkaf-Molli HLe Crom Y(2023)FedShop: A Benchmark for Testing the Scalability of SPARQL Federation EnginesThe Semantic Web – ISWC 202310.1007/978-3-031-47243-5_16(285-301)Online publication date: 6-Nov-2023
https://dl.acm.org/doi/10.1007/978-3-031-47243-5_16
Qudus USaleem MNgonga Ngomo ALee YSaleem MVerborgh RAli MHartig O(2021)An empirical evaluation of cost-based federated SPARQL query processing enginesSemantic Web10.3233/SW-20042012:6(843-868)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.3233/SW-200420
Frey JMüller KHellmann SRahm EVidal MNgonga Ngomo AFundulaki IKrithara A(2019)Evaluation of metadata representations in RDF storesSemantic Web10.3233/SW-18030710:2(205-229)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.3233/SW-180307
Fafalios PTzitzikas YHung CPapadopoulos G(2019)How many and what types of SPARQL queries can be answered through zero-knowledge link traversal?Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297505(2267-2274)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297505
Sorrentino CGiallonardo EZimeo EHung CPapadopoulos G(2019)Topic-based indexing of federated datasetsProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297387(1090-1098)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297387
Galicia JMesmoudi ABellatreche L(2019)RDFPartSuite: Bridging Physical and Logical RDF PartitioningBig Data Analytics and Knowledge Discovery10.1007/978-3-030-27520-4_10(136-150)Online publication date: 26-Aug-2019
https://dl.acm.org/doi/10.1007/978-3-030-27520-4_10
Abdelaziz IMansour EOuzzani MAboulnaga AKalnis P(2017)LusailProceedings of the VLDB Endowment10.1145/3186728.316414411:4(485-498)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1145/3186728.3164144
Hitzler PJanowicz K(2016)EditorialSemantic Web10.3233/SW-1602327:5(481-481)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.3233/SW-160232

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents