Abstract
To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is impossible for legal reasons, and current federated querying techniques cannot handle this large scale of distribution at the expected performance. We propose an extension to Link Traversal Query Processing (LTQP) that incorporates structural properties within decentralized environments to tackle their unprecedented scale. In this article, we analyze the structural properties of the Solid decentralization ecosystem that are relevant for query execution, we introduce novel LTQP algorithms leveraging these structural properties, and evaluate their effectiveness. Our experiments indicate that these new algorithms obtain correct results in the order of seconds, which existing algorithms cannot achieve. This work reveals that a traversal-based querying method using structural assumptions can be effective for large-scale decentralization, but that advances are needed in the area of query planning for LTQP to handle more complex queries. These insights open the door to query-driven decentralized applications, in which declarative queries shield developers from the inherent complexity of a decentralized landscape.
Canonical version: https://comunica.github.io/Article-ISWC2023-SolidQuery/
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Berners-Lee, T.J.: Information management: a proposal (1989)
Verborgh, R.: Re-decentralizing the web, for good this time. In: Seneviratne, O., Hendler, J. (eds.) Linking the World’s Information: A Collection of Essays on the Work of Sir Tim Berners-Lee. ACM (2022)
Bluesky. Bluesky (2023). https://blueskyweb.xyz/
Zignani, M., Gaito, S., Rossi, G.P.: Follow the Mastodon: structure and evolution of a decentralized online social network. In: Twelfth International AAAI Conference on Web and Social Media (2018)
Kuhn, T., Taelman, R., Emonet, V., Antonatos, H., Soiland-Reyes, S., Dumontier, M.: Semantic micro-contributions with decentralized nanopublication services. PeerJ Comput. Sci. (2021). https://doi.org/10.7717/peerj-cs.387
Hogan, A., et al.: Knowledge graphs. In: Synthesis Lectures on Data, Semantics, and Knowledge, vol. 12, pp. 1–257 (2021)
Dedecker, R., Slabbinck, W., Wright, J., Hochstenbach, P., Colpaert, P., Verborgh, R.: What’s in a Pod? – a knowledge graph interpretation for the Solid ecosystem. In: Saleem, M., Ngonga Ngomo, A.-C. (eds.) Proceedings of the 6th Workshop on Storing, Querying and Benchmarking Knowledge Graphs, pp. 81–96 (2022)
Berners-Lee, T.: Linked Data (2009). https://www.w3.org/DesignIssues/LinkedData.html
Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1: Concepts and Abstract Syntax. W3C (2014). https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/
Feigenbaum, L., Todd Williams, G., Grant Clark, K., Torres, E.: SPARQL 1.1 Protocol. W3C (2013). https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: Fedx: optimization techniques for federated query processing on linked data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_38
Verborgh, R., et al.: Triple pattern fragments: a low-cost knowledge graph interface for the web. J. Web Semant. 37, 184–206 (2016)
Saleem, M., Ngonga Ngomo, A.-C.: Hibiscus: hypergraph-based source selection for SPARQL endpoint federation. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 176–191. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07443-6_13
Görlitz, O., Staab, S.: Splendid: SPARQL endpoint federation exploiting void descriptions. In: Proceedings of the Second International Conference on Consuming Linked Data, vol. 782, pp. 13–24. CEUR-WS.org (2011)
Hartig, O.: An overview on execution strategies for linked data queries. Datenbank-Spektrum 13, 89–99 (2013)
Hartig, O., Freytag, J.-C.: Foundations of traversal based query execution over linked data. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, pp. 43–52. ACM (2012)
Speicher, S., Arwe, J., Malhotra, A.: Linked Data Platform 1.0. W3C (2015). https://www.w3.org/TR/ldp/
Turdean, T.: Type Indexes. Solid (2022). https://solid.github.io/type-indexes/
Hartig, O.: SPARQL for a web of linked data: semantics and computability. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 8–23. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_8
Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.-U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: Proceedings of the 19th International Conference on World Wide Web, pp. 411–420 (2010)
Umbrich, J., Hose, K., Karnstedt, M., Harth, A., Polleres, A.: Comparing data summaries for processing live queries over linked data. World Wide Web 14, 495–544 (2011)
Hartig, O., Hose, K., Sequeda, J.: Linked data management. In: Sakr, S., Zomaya, A. (eds.) Encyclopedia of Big Data Technologies, pp. 1–7. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-63962-8_76-1
Hartig, O.: Zero-knowledge query planning for an iterator implementation of link traversal based query execution. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 154–169. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_11
Mendelzon, A.O., Mihaila, G.A., Milo, T.: Querying the world wide web. In: Fourth International Conference on Parallel and Distributed Information Systems, pp. 80–91. IEEE (1996)
Konopnicki, D., Shmueli, O.: Information gathering in the world-wide web: the W3QL query language and the W3QS system. ACM Trans. Datab. Syst. 23, 369–410 (1998)
Chakrabarti, S., Van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific Web resource discovery. Comput. Netw. 31, 1623–1640 (1999)
Batsakis, S., Petrakis, E.G.M., Milios, E.: Improving the performance of focused web crawlers. Data Knowl. Eng. 68, 1001–1013 (2009)
Hartig, O., Pirrò, G.: SPARQL with property paths on the web. Semantic Web 8, 773–795 (2017)
Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 Query Language. W3C (2013). https://www.w3.org/TR/2013/REC-sparql11-query-20130321/
Bogaerts, B., Ketsman, B., Zeboudj, Y., Aamer, H., Taelman, R., Verborgh, R.: Link traversal with distributed subweb specifications. In: Proceedings of the Rules and Reasoning: 5th International Joint Conference, RuleML+RR 2021, Leuven, 8–15 September 2021 (2021)
Hartig, O., Özsu, M.T.: Walking without a map: optimizing response times of traversal-based linked data queries (extended version). arXiv preprint arXiv:1607.01046 (2016)
Schaffert, S., Bauer, C., Kurz, T., Dorschel, F., Glachs, D., Fernandez, M.: The linked media framework: Integrating and interlinking enterprise media content and data. In: Proceedings of the 8th International Conference on Semantic Systems, pp. 25–32 (2012)
Hartig, O., Pérez, J.: LDQL: a query language for the web of linked data. J. Web Semant. 41, 9–29 (2016)
Fionda, V., Pirrò, G., Gutierrez, C.: NautiLOD: a formal language for the web of data graph. ACM Trans. Web (TWEB) 9, 1–43 (2015)
Capadisli, S., Berners-Lee, T., Verborgh, R., Kjernsmo, K.: Solid Protocol. Solid (2020). https://solidproject.org/TR/protocol
Van Herwegen, J., Verborgh, R., Taelman, R., Bosquet, M.: Community Solid Server (2022). https://github.com/CommunitySolidServer/CommunitySolidServer
Inrupt. PodSpaces (2022). https://docs.inrupt.com/pod-spaces/
Flanders, D.: The Flemish Data Utility Company (2022). https://www.vlaanderen.be/digitaal-vlaanderen/het-vlaams-datanutsbedrijf/the-flemish-data-utility-company
Capadisli, S.: Web Access Control. Solid (2022). https://solid.github.io/web-access-control-spec/
Bosquet, M.: Access Control Policy (ACP). Solid (2022). https://solid.github.io/authorization-panel/acp-specification/
Coburn, A., Pavlik, E., Zagidulin, D.: Solid-OIDC. Solid (2022). https://solid.github.io/solid-oidc/
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Datab. Syst. 34, 1–45 (2009)
Hartig, O.: SQUIN: a traversal based query execution system for the web of linked data. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1081–1084 (2013)
Ladwig, G., Tran, T.: SIHJoin: querying remote and local linked data. In: Antoniou, G., et al. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 139–153. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_10
Miranker, D.P., Depena, R.K., Jung, H., Sequeda, J.F., Reyna, C.: Diamond: a SPARQL query engine, for linked data based on the Rete match. In: Proceedings of the Workshop on Artificial Intelligence Meets the Web of Data (AImWD) (2012)
Wilschut, A.N., Apers, P.M.G.: Pipelining in query execution. In: Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications, p. 562. IEEE (1990)
Taelman, R., Van Herwegen, J., Vander Sande, M., Verborgh, R.: Comunica: a Modular SPARQL query engine for the web. In: Proceedings of the 17th International Semantic Web Conference (2018)
Fafalios, P., Yannakis, T., Tzitzikas, Y.: Querying the web of data with SPARQL-LD. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds.) TPDL 2016. LNCS, vol. 9819, pp. 175–187. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43997-6_14
Hartig, O.: How caching improves efficiency and result completeness for querying linked data. In: LDOW (2011)
Erling, O., et al.: The LDBC social network benchmark: interactive workload. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 619–630 (2015)
Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, pp. 4–33. ACM (2010)
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 7th International Conference on World Wide Web, pp. 595–604. ACM (2008)
Nielsen, J.: Response times: the three important limits. Usabil. Eng. (1993)
Deshpande, A., Ives, Z., Raman, V.: Adaptive query processing. Found. Trends® Databases 1, 1–140 (2007)
Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_2
Acosta, M., Vidal, M.-E.: Networks of linked data eddies: an adaptive web query processing engine for RDF data. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 111–127. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_7
Heling, L., Acosta, M.: Robust query processing for linked data fragments. Semantic Web 1–35 (2022)
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 984–994. IEEE (2011)
Prud’hommeaux, E., Bingham, J.: Shape Trees Specification. W3C (2021). https://shape-trees.org/TR/specification/
Taelman, R., Steyskal, S., Kirrane, S.: Towards querying in decentralized environments with privacy-preserving aggregation. In: Proceedings of the 4th Workshop on Storing, Querying, and Benchmarking the Web of Data (2020)
Taelman, R., Verborgh, R.: A prospective analysis of security vulnerabilities within link traversal-based query processing. In: Proceedings of the 6th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs (2022)
Azzam, A., Fernández, J.D., Acosta, M., Beno, M., Polleres, A.: SMART-KG: hybrid shipping for SPARQL querying on the web. In: Proceedings of the Web Conference 2020, pp. 984–994 (2020)
Minier, T., Skaf-Molli, H., Molli, P.: SaGe: web preemption for public SPARQL query services. In: The World Wide Web Conference, pp. 1268–1278 (2019)
Azzam, A., Aebeloe, C., Montoya, G., Keles, I., Polleres, A., Hose, K.: WiseKG: balanced access to web knowledge graphs. In: Proceedings of the Web Conference 2021, pp. 1422–1434 (2021)
Aebeloe, C., Keles, I., Montoya, G., Hose, K.: Star pattern fragments: accessing knowledge graphs through star patterns. arXiv preprint arXiv:2002.09172 (2020)
Hartig, O., Buil-Aranda, C.: Bindings-restricted triple pattern fragments. In: Debruyne, C., et al. (eds.) OTM 2016. LNCS, vol. 10033, pp. 762–779. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48472-3_48
Heling, L., Acosta, M.: Federated SPARQL query processing over heterogeneous linked data fragments. In: Proceedings of the ACM Web Conference 2022, pp. 1047–1057 (2022)
Cheng, S., Hartig, O.: FedQPL: A language for logical query plans over heterogeneous federations of RDF data sources. In: Proceedings of the 22nd International Conference on Information Integration and Web-based Applications and Services, pp. 436–445 (2020)
Montoya, G., Aebeloe, C., Hose, K.: Towards efficient query processing over heterogeneous RDF interfaces. In: 2nd Workshop on Decentralizing the Semantic Web, DeSemWeb 2018. CEUR Workshop Proceedings (2018)
Acknowledgements
This work is supported by SolidLab Vlaanderen (Flemish Government, EWI and RRF project VV023/10). Ruben Taelman is a postdoctoral fellow of the Research Foundation – Flanders (FWO) (1274521N).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Supplemental Material Statement
Supplemental Material Statement
Implementation: https://github.com/comunica/comunica-feature-link-traversal Experiments: https://github.com/comunica/Experiments-Solid-Link-Traversal Benchmark: https://github.com/SolidBench/SolidBench.js.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Taelman, R., Verborgh, R. (2023). Link Traversal Query Processing Over Decentralized Environments with Structural Assumptions. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-47240-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47239-8
Online ISBN: 978-3-031-47240-4
eBook Packages: Computer ScienceComputer Science (R0)