Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Optimizing RPQs over a compact graph representation

Published: 07 September 2023 Publication History

Abstract

We propose techniques to evaluate regular path queries (RPQs) over labeled graphs (e.g., RDF). We apply a bit-parallel simulation of a Glushkov automaton representing the query over a ring: a compact wavelet-tree-based index of the graph. To the best of our knowledge, our approach is the first to evaluate RPQs over a compact representation of such graphs, where we show the key advantages of using Glushkov automata in this setting. Our scheme obtains optimal time, in terms of alternation complexity, for traversing the product graph. We further introduce various optimizations, such as the ability to process several automaton states and graph nodes/labels simultaneously, and to estimate relevant selectivities. Experiments show that our approach uses 3–5× less space, and is over 5× faster, on average, than the next best state-of-the-art system for evaluating RPQs.

References

[1]
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the VLDB, pp. 411–422 (2007)
[2]
Abul-Basher, Z.: Multiple-query optimization of regular path queries. In: Proceedings of the ICDE, pp. 1426–1430 (2017)
[3]
Alkhateeb F and Euzenat J Constrained regular expressions for answering RDF-path queries modulo RDFS Int. J. Web Inf. Syst. 2014 10 1 24-50
[4]
Angles R, Arenas M, Barceló P, Hogan A, Reutter JL, and Vrgoc D Foundations of modern query languages for graph databases ACM Comput. Surv. 2017 50 5 68:1-68:40
[5]
Angles, R., Arenas, M., Barceló, P., Boncz, P.A., Fletcher, G.H.L., Gutiérrez, C., Lindaaker, T., Paradies, M., Plantikow, S., Sequeda, J.F., van Rest, O., Voigt, H.: G-CORE: a core for future graph query languages. In: Proceedings of the SIGMOD, pp. 1421–1432 (2018)
[6]
Arenas, M., Conca, S., Pérez, J.: Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In: Proceedings of the WWW, pp. 629–638 (2012)
[7]
Arroyuelo, D., Hogan, A., Navarro, G., Reutter, J., Rojas-Ledesma, J., Soto, A.: Worst-case optimal graph joins in almost no space. In: Proceedings of the SIGMOD, pp. 102–114 (2021)
[8]
Arroyuelo, D., Hogan, A., Navarro, G., Rojas-Ledesma, J.: Time- and space-efficient regular path queries. In: Proceedings of the ICDE, pp. 3091–3105 (2022)
[9]
Atserias A, Grohe M, and Marx D Size bounds and query plans for relational joins SIAM J. Comput. 2013 42 4 1737-1767
[10]
Baier, J.A., Daroch, D., Reutter, J.L., Vrgoc, D.: Evaluating navigational RDF queries over the Web. In: Proceedings of the ACM HT, pp. 165–174 (2017)
[11]
Barbay J and Kenyon C Alternation and redundancy analysis of the intersection problem ACM Trans. Algorithm 2008 4 1 1-18
[12]
Berry G and Sethi R From regular expression to deterministic automata Theor. Comput. Sci. 1986 48 1 117-126
[13]
Biega, J., Kuzey, E., Suchanek, F.M.: Inside YAGO2s: a transparent information extraction architecture. In: Proceedings of the WWW, pp. 325–328 (2013)
[14]
Bonchi, F., Gionis, A., Gullo, F., Ukkonen, A.: Distance oracles in edge-labeled graphs. In: Proceedings of the EDBT, pp. 547–558 (2014)
[15]
Bonifati, A., Martens, W., Timm, T.: Navigating the maze of Wikidata query logs. In: Proceedings of the WWW, pp. 127–138 (2019)
[16]
Bonifati A, Martens W, and Timm T An analytical study of large SPARQL query logs VLDB J. 2020 29 2–3 655-679
[17]
Brüggemann-Klein A Regular expressions into finite automata Theor. Comput. Sci. 1993 120 2 197-213
[18]
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
[19]
Clark, D.R.: Compact PAT trees. PhD thesis, University of Waterloo, Canada (1996)
[20]
Claude F, Navarro G, and Ordóñez A The wavelet matrix: an efficient wavelet tree for large alphabets Inf. Syst. 2015 47 15-32
[21]
Colazzo, D., Mecca, V., Nolé, M., Sartiani, C.: PathGraph: querying and exploring big data graphs. In: Proceedings of the SSDBM, pp. 29:1–29:4 (2018)
[22]
Cruz, I.F., Mendelzon, A.O., Wood, P.T.: A graphical query language supporting recursion. In: Proceedings of the SIGMOD, pp. 323–330 (1987)
[23]
Deutsch, A., Xu, Y., Wu, M., Lee, V.E.: Aggregation support for modern graph analytics in TigerGraph. In: Proceedings of the SIGMOD, pp. 377–392 (2020)
[24]
Deutsch, A., Francis, N., Green, A., Hare, K., Li, B., Libkin, L., Lindaaker, T., Marsault, V., Martens, W., Michels, J., Murlak, F., Plantikow, S., Selmer, P., van Rest, O., Voigt, H., Vrgoc, D., Wu, M., Zemke, F.: Graph pattern matching in GQL and SQL/PGQ. In: Proceedings of the SIGMOD, pp. 2246–2258 (2022)
[25]
Dey, S.C., Cuevas-Vicentín, V., Köhler, S., Gribkoff, E., Wang, M., Ludäscher, B.: On implementing provenance-aware regular path queries with relational query engines. In: Proceedings of the EDBT/ICDT, pp. 214–223 (2013)
[26]
Erling, O., Mikhailov, I.: RDF support in the virtuoso DBMS. In: Networked Knowledge—Networked Media, pp. 7–24. Springer (2009)
[27]
Ferragina P and Manzini G Indexing compressed texts J. ACM 2005 52 4 552-581
[28]
Fionda V, Pirrò G, and Consens MP Querying knowledge graphs with extended property paths Semant. Web 2019 10 6 1127-1168
[29]
Fletcher, G.H.L., Peters, J., Poulovassilis, A.: Efficient regular path query evaluation using path indexes. In: Proceedings of the EDBT, pp. 636–639 (2016)
[30]
Francis, N., Green, A., Guagliardo, P., Libkin, L., Lindaaker, T., Marsault, V., Plantikow, S., Rydberg, M., Selmer, P., Taylor, A.: Cypher: An evolving query language for property graphs. In: Proceedings of the SIGMOD, pp. 1433–1445 (2018)
[31]
Gagie T, Navarro G, and Puglisi S New algorithms on wavelet trees and applications to information retrieval Theor. Comput. Sci. 2012 426 25-41
[32]
Gagie T, Navarro G, and Puglisi SJ New algorithms on wavelet trees and applications to information retrieval Theor. Comput. Sci. 2012 426–427 25-41
[33]
Gagie T, Kärkkäinen J, Navarro G, and Puglisi SJ Colored range queries and document retrieval Theor. Comput. Sci. 2013 483 36-50
[34]
Glushkov V-M The abstract theory of automata Russ. Math. Surv. 1961 16 1-53
[35]
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the SODA, pp. 841–850 (2003)
[36]
Gubichev, A., Bedathur, S.J., Seufert, S.: Sparqling kleene: fast property paths in RDF-3X. In: Proceedings of the GRADES, pp. 14 (2013)
[37]
Guo X, Gao H, and Zou Z Distributed processing of regular path queries in RDF graphs Knowl. Inf. Syst. 2021 63 4 993-1027
[38]
Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 Query Language. W3C Recommendation (2013). http://www.w3.org/TR/sparql11-query/
[39]
Hartig O and Pirrò G SPARQL with property paths on the Web Semant. Web 2017 8 6 773-795
[40]
Jachiet, L., Genevès, P., Gesbert, N., Layaïda, N.: On the optimization of recursive relational queries: application to graph queries. In: Proceedings of the SIGMOD, pp. 681–697 (2020)
[41]
Jin, R., Hong, H., Wang, H., Ruan, N., Xiang, Y.: Computing label-constraint reachability in graph databases. In: Proceedings of the SIGMOD, pp. 123–134 (2010)
[42]
Koschmieder, A., Leser, U.: Regular path queries on large graphs. In: Proceedings of the SSDBM, pp. 177–194 (2012)
[43]
Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoc, D.: SPARQL with property paths. In: Proceedings of the ISWC, pp. 3–18 (2015)
[44]
Kuijpers, J., Fletcher, G., Lindaaker, T., Yakovets, N.: Path indexing in the cypher query pipeline. In: Proceedings of the EDBT, pp. 582–587 (2021)
[45]
Liu, B., Wang, X., Liu, P., Li, S., Wang, X.: PAIRPQ: an efficient path index for regular path queries on knowledge graphs. In: Proceedings of the APWeb-WAIM, pp. 106–120 (2021)
[46]
Malyshev, S., Krötzsch, M., González, L., Gonsior, J., Bielefeldt, A.: Getting the most out of Wikidata: semantic technology usage in Wikipedia’s knowledge graph. In: Proceedings of the ISWC, pp. 376–394 (2018)
[47]
Martínez-Prieto MA, Brisaboa N, Cánovas R, Claude F, and Navarro G Practical compressed string dictionaries Inf. Syst. 2016 56 73-108
[48]
Mehmood Q, Saleem M, Sahay R, Ngomo AN, and d’Aquin M QPPDs: querying property paths over distributed RDF datasets IEEE Access 2019 7 101031-101045
[49]
Mendelzon AO and Wood PT Finding regular simple paths in graph databases SIAM J. Comput. 1995 24 6 1235-1258
[50]
Miura, K., Amagasa, T., Kitagawa, H.: Accelerating regular path queries using FPGA. In: Bordawekar, R., Lahiri, T. (eds.) Proceedings of the ADMS@VLDB, pp. 47–54 (2019)
[51]
Munro JI Chandru V and Vinay V Tables Foundations of Software Technology and Theoretical Computer Science 1996 Berlin, Heidelberg Springer 37-42
[52]
Munro JI, Raman R, Raman V, and S. SR Succinct representations of permutations and functions Theor. Comput. Sci. 2012 438 74-88
[53]
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the SODA, pp. 657–666 (2002)
[54]
Navarro G Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences ACM Comput. Surv. 2013 46 4 52:1-52:47
[55]
Navarro G Wavelet trees for all J. Discrete Algorithm 2014 25 2-20
[56]
Navarro G and Raffinot M New techniques for regular expression searching Algorithmica 2005 41 2 89-116
[57]
Nguyen V and Kim K Efficient regular path query evaluation by splitting with unit-subquery cost matrix IEICE Trans. Inf. Syst. 2017 100 10 2648-2652
[58]
Nolé, M., Sartiani, C.: Regular path queries on massive graphs. In: Proceedings of the SSDBM, pp. 13:1–13:12 (2016)
[59]
Pacaci, A., Bonifati, A., Özsu, M.T.: Regular path query evaluation on streaming graphs. In: Proceedings of the SIGMOD, pp. 1415–1430 (2020)
[60]
Peng Y, Zhang Y, Lin X, Qin L, and Zhang W Answering billion-scale label-constrained reachability queries within microsecond PVLDB 2020 13 6 812-825
[61]
Peng Y, Lin X, Zhang Y, Zhang W, and Qin L Answering reachability and k-reach queries on large graphs with label constraints VLDB J. 2022 31 1 101-127
[62]
Pérez J, Arenas M, and Gutiérrez C nSPARQL: a navigational language for RDF J. Web Semant. 2010 8 4 255-270
[63]
Seufert, S., Anand, A., Bedathur, S.J., Weikum, G.: FERRARI: flexible and efficient reachability range assignment for graph indexing. In: Proceedings of the ICDE, pp. 1009–1020 (2013)
[64]
Tetzel F, Lehner W, and Kasperovics R Efficient compilation of regular path queries Datenbank Spektrum 2020 20 3 243-259
[65]
Thompson, B.B., Personick, M., Cutcher, M.: The Bigdata®RDF graph database. In: Linked data management, pp. 193–237. Chapman and Hall/CRC (2014)
[66]
Valstar, L.D.J., Fletcher, G.H.L., Yoshida, Y.: Landmark indexing for evaluation of label-constrained reachability queries. In: Proceedings of the SIGMOD, pp. 345–358 (2017)
[67]
van Rest, O., Hong, S., Kim, J., Meng, X., Chafi, H.: PGQL: a property graph query language. In: Proceedings of the GRADES, p. 7 (2016)
[68]
Veldhuizen, T.L.: Triejoin: a simple, worst-case optimal join algorithm. In: Proceedings of the ICDT, pp. 96–106 (2014)
[69]
Vrandecic D and Krötzsch M Wikidata: a free collaborative knowledgebase Commun. ACM 2014 57 10 78-85
[70]
Wadhwa, S., Prasad, A., Ranu, S., Bagchi, A., Bedathur, S.: Efficiently answering regular simple path queries on large labeled networks. In: Proceedings of the SIGMOD, pp. 1463–1480 (2019)
[71]
Wang, X., Rao, G., Jiang, L., Lyu, X., Yang, Y., Feng, Z.: TraPath: fast regular path query evaluation on large-scale RDF graphs. In: Proceedings of the WAIM, pp. 372–383 (2014)
[72]
Wang, X., Wang, J., Zhang, X.: Efficient distributed regular path queries on RDF graphs using partial evaluation. In: Proceedings of the CIKM, pp. 1933–1936 (2016)
[73]
Yakovets, N., Godfrey, P., Gryz, J.: Evaluation of SPARQL property paths via recursive SQL. In: Proceedings of the AMW (2013)
[74]
Yakovets, N., Godfrey, P., Gryz, J.: Query planning for evaluating SPARQL property paths. In: Proceedings of the SIGMOD, pp. 1875–1889 (2016)
[75]
Zou L, Xu K, Yu JX, Chen L, Xiao Y, and Zhao D Efficient processing of label-constraint reachability queries in large graphs Inf. Syst. 2014 40 47-66

Cited By

View all
  • (2025)Evaluating regular path queries on compressed adjacency matricesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00885-634:1Online publication date: 1-Jan-2025
  • (2024)Tackling Challenges in Implementing Large-Scale Graph DatabasesCommunications of the ACM10.1145/365331467:8(40-44)Online publication date: 16-Jul-2024
  • (2024)Compressed Graph Representations for Evaluating Regular Path QueriesString Processing and Information Retrieval10.1007/978-3-031-72200-4_17(218-232)Online publication date: 23-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 33, Issue 2
Mar 2024
305 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 September 2023
Accepted: 17 August 2023
Revision received: 13 June 2023
Received: 04 November 2022

Author Tags

  1. Regular path queries
  2. Ring index
  3. Succinct data structures
  4. Graph databases

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Evaluating regular path queries on compressed adjacency matricesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00885-634:1Online publication date: 1-Jan-2025
  • (2024)Tackling Challenges in Implementing Large-Scale Graph DatabasesCommunications of the ACM10.1145/365331467:8(40-44)Online publication date: 16-Jul-2024
  • (2024)Compressed Graph Representations for Evaluating Regular Path QueriesString Processing and Information Retrieval10.1007/978-3-031-72200-4_17(218-232)Online publication date: 23-Sep-2024

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media