Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An empirical evaluation of cost-based federated SPARQL query processing engines

Published: 01 January 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines across different performance metrics, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful for the development of future cost-based federated SPARQL query processing engines.

    References

    [1]
    I. Abdelaziz, E. Mansour, M. Ouzzani, A. Aboulnaga and P.K. Lusail, A system for querying linked data at scale, Proc. VLDB Endow. 11(4) (2017), 485–498.
    [2]
    M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo and E. Ruckhaus, ANAPSID: An adaptive query processing engine for SPARQL endpoints, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 18–34.
    [3]
    M. Acosta, M.-E. Vidal and Y. Sure-Vetter, Diefficiency metrics: Measuring the continuous efficiency of query processing approaches, in: The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudré-Mauroux, J. Sequeda, C. Lange and J. Heflin, eds, Springer-Verlag Berlin Heidelberg, Cham, 2017, pp. 3–19.
    [4]
    N. Aini Rakhmawati, M. Saleem, S. Lalithsena and S. Decker, QFed: Query set for federated SPARQL query benchmark, in: Proceedings of the 16th International Conference on Information Integration and Web-Based Applications & Services, iiWAS ’14, ACM, New York, NY, USA, 2014, pp. 207–211.
    [5]
    N. Aini Rakhmawati, J. Umbrich, M. Karnstedt, A. Hasnain and M. Hausenblas, Querying over federated SPARQL endpoints – A state of the art survey. CoRR, 2013,.
    [6]
    K. Alexander, R. Cyganiak, M. Hausenblas and J. Zhao, Describing linked datasets – On the design and usage of void, the vocabulary of interlinked datasets, in: Linked Data on the Web Workshop (LDOW 09), in Conjunction with 18th International World Wide Web Conference (WWW 09), Vol. 538, 2010.
    [7]
    C. Bizer and A. Schultz, The Berlin SPARQL benchmark, International Journal on Semantic Web and Information Systems (IJSWIS) 5 (2009), 1–24.
    [8]
    C. Buil-Aranda, A. Hogan, J. Umbrich and P.-Y. Vandenbussche, SPARQL web-querying infrastructure: Ready for action? in: The Semantic Web – ISWC 2013, H. Alani, L. Kagal, A. Fokoue, P. Groth, C. Biemann, J.X. Parreira, L. Aroyo, N. Noy, C. Welty and K. Janowicz, eds, Springer, Berlin, Heidelberg, 2013, pp. 277–293.
    [9]
    A. Charalambidis, A. Troumpoukis and S.K. Semagrow, Optimizing federated SPARQL queries, in: Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS ’15, ACM, New York, NY, USA, 2015, pp. 121–128.
    [10]
    F. Conrads, J. Lehmann, M. Saleem, M. Morsey and A.-C. Ngonga Ngomo, I guana: A generic framework for benchmarking the read-write performance of triple stores, in: International Semantic Web Conference, Springer, Cham, 2017, pp. 48–65.
    [11]
    F. Du, Y. Chen and X. Du, Partitioned indexes for entity search over RDF knowledge bases, in: Proceedings of the 17th International Conference on Database Systems for Advanced Applications – Volume Part I, DASFAA ’12, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 141–155.
    [12]
    K.M. Endris, M. Galkin, I. Lytra, M.N. Mami, M.-E. Vidal and S. Auer, MULDER: Querying the linked data web by bridging RDF molecule templates, in: Database and Expert Systems Applications (DEXA ’17), D. Benslimane, E. Damiani, W.I. Grosky, A. Hameurlain, A. Sheth and R.R. Wagner, eds, Vol. 8, Springer, Cham, 2017, pp. 3–18.
    [13]
    O. Görlitz and S. Staab, SPLENDID: SPARQL endpoint federation exploiting VOID descriptions, in: Proceedings of the Second International Conference on Consuming Linked Data (COLD ’11), Vol. 782, CEUR-WS.org, Aachen, Germany, 2010, pp. 13–24.
    [14]
    O. Görlitz, M. Thimm and S. Staab, SPLODGE: Systematic generation of SPARQL benchmark queries for linked open data, in: Proceedings of the 11th International Conference on the Semantic Web – Part I, the Semantic Web – ISWC ’12, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 116–132.
    [15]
    A. Gubichev and T. Neumann, Exploiting the query structure for efficient join ordering in SPARQL queries, in: EDBT, Vol. 14, 2014, pp. 439–450.
    [16]
    O. Hartig, C. Bizer and J.-C. Freytag, Executing SPARQL queries over the web of linked data, in: Proceedings of the 8th International Semantic Web Conference, ISWC ’09, Springer-Verlag, Berlin, Heidelberg, 2009, pp. 293–309.
    [17]
    A. Hasnain, R. Fox, S. Decker and H.F. Deus, Cataloguing and linking life sciences LOD cloud, in: 1st International Workshop on Ontology Engineering in a Data-Driven World (OEDW 2012) Collocated with 8th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2012), 2012, pp. 114–130.
    [18]
    A. Hasnain, Q. Mehmood, S. Sana, E. Zainab, M. Saleem, C. Warren, Jr., D. Zehra, S. Decker and D. Rebholz-Schuhman BioFed: Federated query processing over life sciences linked open data, Journal of Biomedical Semantics 8(1) (2017), 13.
    [19]
    A. Hasnain, M. Saleem, A.-C. Ngonga Ngomo and D. Rebholz-Schuhmann, Extending LargeRDFBench for multi-source data at scale for SPARQL endpoint federation, in: Proceedings of the 12th International Workshop on Scalable Semantic Web Knowledge Base Systems Co-Located with 17th International Semantic Web Conference, SSWS@ISWC 2018, Monterey, California, USA, October 9, 2018, Vol. 2179, 2018, pp. 28–44.
    [20]
    A. Hasnain, S. Sana e Zainab, M.R. Kamdar, Q. Mehmood, C.N. Warren, Jr., Q.A. Fatimah, H.F. Deus, M. Mehdi and S. Decker, A roadmap for navigating the life sciences linked open data cloud, in: Semantic Technology, T. Supnithi, T. Yamaguchi, J.Z. Pan, V. Wuwongse and M. Buranarach, eds, Lecture Notes in Computer Science, Vol. 8943, Springer International Publishing, 2015, pp. 97–112.
    [21]
    P.W. Holland and R.E. Welsch, Robust regression using iteratively reweighted least-squares, Communications in Statistics – Theory and Methods 6(9) (1977), 813–827.
    [22]
    P.J. Huber, Robust Estimation of a Location Parameter, Springer, New York, New York, NY, 1992, pp. 492–518.
    [23]
    Y. Khan, M. Saleem, A. Iqbal, M. Mehdi, A. Hogan, A.-C. Ngonga Ngomo, S. Decker and R. Sahay, SAFE: Policy aware SPARQL query federation over RDF data cubes, in: Proceedings of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences, Berlin, Germany, December 9–11, 2014, 2014.
    [24]
    Y. Khan, M. Saleem, M. Mehdi, A. Hogan, Q. Mehmood, D. Rebholz-Schuhmann and R. Sahay, SAFE: SPARQL federation over RDF data cubes with access control, Journal of biomedical semantics 8(1) (2017), 5.
    [25]
    D. Kossmann, The state of the art in distributed query processing, ACM Comput. Surv. 32(4) (2000), 422–469.
    [26]
    G. Ladwig and T. Tran, SIHJoin: Querying remote and local linked data, in: The Semantic Web: Research and Applications, G. Antoniou, M. Grobelnik, E. Simperl, B. Parsia, D. Plexousakis, P. De Leenheer and J. Pan, eds, Lecture Notes in Computer Science, Vol. 6643, Springer, Berlin, Heidelberg, 2011, pp. 139–153.
    [27]
    V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper and T. Neumann, How good are query optimizers, really? Proc. VLDB Endow. 9(3) (2015), 204–215.
    [28]
    S. Lynden, I. Kojima, A. Matono and Y. Tanimura, ADERIS: An adaptive query processor for joining federated SPARQL endpoints, in: On the Move to Meaningful Internet Systems (OTM2011), Part II, R. Meersman, T. Dillon, P. Herrero, A. Kumar, M. Reichert, L. Qing, B.-C. Ooi, E. Damiani, D.C. Schmidt, J. White, M. Hauswirth, P. Hitzler and M. Mohania, eds, LNCS, Vol. 7045, Springer, Heidelberg, 2011, pp. 808–817.
    [29]
    G. Moerkotte, T. Neumann and G. Steidl, Preventing bad plans by bounding the impact of cardinality estimation errors, Proc. VLDB Endow. 2(1) (2009), 982–993.
    [30]
    G. Montoya, H. Skaf-Molli and K. Hose, The odyssey approach for optimizing federated SPARQL queries, The Semantic Web – ISWC 2017 (2017), 471–489.
    [31]
    G. Montoya, M.-E. Vidal and M. Acosta, A heuristic-based approach for planning federated SPARQL queries, in: Proceedings of the Third International Conference on Consuming Linked Data (COLD ’12), Vol. 905, CEUR-WS.org, Aachen, Germany, 2012, pp. 63–74.
    [32]
    G. Montoya, M.-E. Vidal, Ó. Corcho, E. Ruckhaus and C.B. Aranda, Benchmarking federated SPARQL query engines: Are existing testbeds enough? in: Proceedings, Part II, The Semantic Web – ISWC 2012 – 11th International Semantic Web Conference, Boston, MA, USA, November 11–15, 2012, Proceedings, Part II, P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J.X. Parreira, J. Hendler, G. Schreiber, A. Bernstein and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7650, Springer, 2012, pp. 313–324.
    [33]
    M. Morsey, J. Lehmann, S. Auer and A.-C. Ngonga Ngonga, DBpedia SPARQL benchmark: Performance assessment with real queries on real data, in: Proceedings of the 10th International Conference on the Semantic Web – Part I, the Semantic Web – ISWC ’11, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 454–469.
    [34]
    T. Neumann and G. Moerkotte, Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins, in: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, IEEE Computer Society, IEEE, 2011, pp. 984–994.
    [35]
    D.P. O’Leary, Robust regression computation using iteratively reweighted least squares, SIAM J. Matrix Anal. Appl. 11(3) (1990), 466–480.
    [36]
    B. Quilitz and U. Leser, Querying distributed RDF data sources with SPARQL, in: Proceedings of the 5th European Semantic Web Conference on the Semantic Web: Research and Applications, ESWC ’08, Springer-Verlag, Berlin, Heidelberg, 2008, pp. 524–538.
    [37]
    P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, Vol. 589, 1st edn, John Wiley & Sons, Inc., New York, NY, USA, 1987.
    [38]
    M. Saleem, A. Hasnain and A.-C. Ngonga Ngomo, LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation, Journal of Web Semantics 48 (2018), 85–125.
    [39]
    M. Saleem, Y. Khan, A. Hasnain, I. Ermilov and A.-C. Ngonga Ngomo, A fine-grained evaluation of SPARQL endpoint federation systems, Semantic Web Journal 7(5) (2016), 493–518.
    [40]
    M. Saleem and A.-C. Ngonga Ngomo, HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation, in: The Semantic Web: Trends and Challenges, V. Presutti, C. d’Amato, F. Gandon, M. d’Aquin, S. Staab and A. Tordai, eds, Lecture Notes in Computer Science, Vol. 8465, Springer International Publishing, 2014, pp. 176–191.
    [41]
    M. Saleem, A.-C. Ngonga Ngomo, J.X. Parreira, H.F. Deus and M. Hauswirth, DAW: Duplicate-AWare federated query processing over the web of data, in: Proceedings of the 12th International Semantic Web Conference – Part I, Lecture Notes in Computer Science, Springer-Verlag, New York, NY, USA, 2013, pp. 574–590.
    [42]
    M. Saleem, S.S. Padmanabhuni, A.-C. Ngonga Ngomo, A. Iqbal, J.S. Almeida, S. Decker and H.F. Deus, TopFed: TCGA tailored federated query processing and linking to LOD, J. Biomed. Semant. 5 (2014), 47.
    [43]
    M. Saleem, A. Potocki, T. Soru, O. Hartig and A.-C. Ngonga Ngomo, CostFed: Cost-based query optimization for SPARQL endpoint federation, in: Proceedings of the 14th International Conference on Semantic Systems, Vol. 137, Elsevier, 2018, pp. 163–174.
    [44]
    M. Saleem, G. Szárnyas, F. Conrads, S.A.C. Bukhari, Q. Mehmood and A.-C. Ngonga Ngomo, How representative is a SPARQL benchmark? An analysis of RDF triplestore benchmarks, in: The World Wide Web Conference, WWW ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 1623–1633.
    [45]
    M. Schmidt, O. Görlitz, P. Haase, G. Ladwig, A. Schwarte and T. Tran, FedBench: A benchmark suite for federated semantic data query processing, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 585–600.
    [46]
    M. Schmidt, T. Hornung, G. Lausen and C. Pinkel, SP 2Bench: A SPARQL performance benchmark, in: Proceedings of the 25th International Conference on Data Engineering ICDE, IEEE, 2009, pp. 222–233.
    [47]
    A. Schwarte, P. Haase, K. Hose, R. Schenkel and M.S. FedX, Optimization techniques for federated query processing on linked data, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 601–616.
    [48]
    J. Umbrich, A. Hogan, A. Polleres and S. Decker, Link traversal querying for a diverse web of data, Semantic Web Journal 6(6) (2015), 585–624.
    [49]
    X. Wang, T. Tiropanis and H. Davis, LHD optimising: Linked data query processing using parallelisation, in: Workshop on Linked Data on the Web (LDOW ’13), Proceedings of the WWW 2013, CEUR Workshop Proceedings, Vol. 996, CEUR-WS.org, Rio de Janeiro, Brazil, 2013.
    [50]
    M. Wylot, M. Hauswirth, P. Cudré-Mauroux and S. Sakr, RDF data storage and query processing schemes: A survey, ACM Comput. Surv. 51(4) (2018), 84:1–84:36.

    Cited By

    View all
    • (2024)Inductive autoencoder for efficiently compressing RDF graphsInformation Sciences: an International Journal10.1016/j.ins.2024.120210662:COnline publication date: 1-Mar-2024
    • (2023)Cluster-Based Joins for Federated SPARQL QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313550735:4(3525-3539)Online publication date: 1-Apr-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Semantic Web
    Semantic Web  Volume 12, Issue 6
    Storing, Querying, and Benchmarking the Web of Data
    2021
    115 pages
    ISSN:1570-0844
    EISSN:2210-4968
    Issue’s Table of Contents

    Publisher

    IOS Press

    Netherlands

    Publication History

    Published: 01 January 2021

    Author Tags

    1. SPARQL
    2. benchmarking
    3. cost-based
    4. cost-free
    5. federated
    6. querying

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Inductive autoencoder for efficiently compressing RDF graphsInformation Sciences: an International Journal10.1016/j.ins.2024.120210662:COnline publication date: 1-Mar-2024
    • (2023)Cluster-Based Joins for Federated SPARQL QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313550735:4(3525-3539)Online publication date: 1-Apr-2023

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media