Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The complexity of regular expressions and property paths in SPARQL

Published: 04 December 2013 Publication History
  • Get Citation Alerts
  • Abstract

    The World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph-structured data. However, they differ from standard regular expressions in several notable aspects. For example, they have a limited form of negation, they have numerical occurrence indicators as syntactic sugar, and their semantics on graphs is defined in a nonstandard manner.
    We formalize the W3C semantics of property paths and investigate various query evaluation problems on graphs. More specifically, let x and y be two nodes in an edge-labeled graph and r be an expression. We study the complexities of: (1) deciding whether there exists a path from x to y that matches r and (2) counting how many paths from x to y match r. Our main results show that, compared to an alternative semantics of regular expressions on graphs, the complexity of (1) and (2) under W3C semantics is significantly higher. Whereas the alternative semantics remains in polynomial time for large fragments of expressions, the W3C semantics makes problems (1) and (2) intractable almost immediately.
    As a side-result, we prove that the membership problem for regular expressions with numerical occurrence indicators and negation is in polynomial time.

    References

    [1]
    Abiteboul, S., Quass, D., Mchugh, J., Widom, J., and Wiener, J. L. 1997. The Lorel query language for semistructured data. Int. J. Digital Libraries 1, 1, 68--88.
    [2]
    Abiteboul, S. and Vianu, V. 1999. Regular path queries with constraints. J. Comput. Syst. Sci. 58, 3, 428--452.
    [3]
    Alechina, N. and Immerman, N. 2000. Reachability logic: An efficient fragment of transitive closure logic. Logic J. IGPL 8, 3, 325--337.
    [4]
    Alkhateeb, F., Baget, J.-F., and Euzenat, J. 2009. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Semantics 7, 2, 57--73.
    [5]
    Alvarez, C. and Jenner, B. 1993. A very hard log-space counting class. Theor. Comput. Sci. 107, 1, 3--30.
    [6]
    Arenas, M., Conca, S., and Perez, J. 2012. Counting beyond a yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard. In Proceedings of the International World Wide Web Conference (WWW'12). ACM Press, New York, 629--638.
    [7]
    Arenas, M. and Perez, J. 2011. Querying semantic web data with SPARQL. In Proceedings of the Symposium on Principles of Database Systems (PODS'11). ACM Press, New York, 305--316.
    [8]
    Bagan, G., Bonifati, A., and Groz, B. 2013. A trichotomy for regular simple path queries on graphs. In Proceedings of the Symposium on Principles of Database Systems (PODS'13). ACM Press, New York.
    [9]
    Berge, C. 1973. Graphs and Hypergraphs. North-Holland Publishing Company.
    [10]
    Bex, G. J., Neven, F., Schwentick, T., and Vansummeren, S. 2010. Inference of concise regular expressions and DTDS. ACM Trans. Datab. Syst. 35, 2, 11:1--11:47.
    [11]
    Book, R., Even, S., Greibach, S., and Ott, G. 1971. Ambiguity in graphs and expressions. IEEE Trans. Comput. 20, 2, 149--153.
    [12]
    Bray, T., Paoli, J., Sperberg-Mcqueen, Maler, C. M. E., and Yergeau, F. 2008. Extensible markup language xml 1.0, 5th ed. Tech. rep. WorldWideWeb Consortium (W3C). http://www.w3.org/TR/2008/REC-xml-20081126/.
    [13]
    Bruggemann-Klein, A. and Wood, D. 1998. One-unambiguous regular languages. Inf. Comput. 142, 2, 182--206.
    [14]
    Buneman, P., Davidson, S. B., Hillebrand, G. G., and Suciu, D. 1996. A query language and optimization techniques for unstructured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'96). ACM Press, New York, 505--516.
    [15]
    Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. 2002. Rewriting of regular expressions and regular path queries. J. Comput. Syst. Sci. 64, 3, 443--465.
    [16]
    Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. 2000a. Containment of conjunctive regular path queries with inverse. In Principles of Knowledge Representation and Reasoning (KR). Morgan Kaufmann, 176--185.
    [17]
    Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. 2000b. View-based query processing for regular path queries with inverse. In Proceedings of the Symposium on Principles of Database Systems (PODS'00). ACM Press, New York, 58--66.
    [18]
    Cleaveland, R. and Steffen, B. 1993. A linear-time model-checking algorithm for the alternation-free modal mu-calculus. Formal Methods Syst. Des. 2, 2, 121--147.
    [19]
    Colazzo, D., Ghelli, G., and Sartiani, C. 2009a. Efficient asymmetric inclusion between regular expression types. In Proceedings of the International Conference Database Theory (ICDT'09). ACM Press, New York, 174--182.
    [20]
    Colazzo, D., Ghelli, G., and Sartiani, C. 2009b. Efficient inclusion for a class of xml types with interleaving and counting. Inf. Syst. 34, 7, 643--656.
    [21]
    Consens, M. P. and Mendelzon, A. O. 1990. GraphLog: A visual formalism for real life recursion. In Proceedings of the Symposium on Principles of Database Systems (PODS'90). ACM Press, New York, 404--416.
    [22]
    Cruz, I. F., Mendelzon, A.O., and Wood, P. T. 1987. A graphical query language supporting recursion. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'87). ACM Press, New York, 323--330.
    [23]
    Deutsch, A. and Tannen, V. 2001. Optimization properties for classes of conjunctive regular path queries. In Proceedings of the International Workshop on Database Programming Languages (DBPL'01). Springer, 1--39.
    [24]
    Fallside, D. and Walmsley, P. 2004. XML schema part 0: Primer, 2nd ed. Tech. rep., World Wide Web Consortium. http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/.
    [25]
    Fernandez, M. F., Florescu, D., Levy, A, Y., and Suciu, D. 2000. Declarative specification of web sites with strudel. Very Large Datab. J. 9, 1, 38--55.
    [26]
    Florescu, D., Levy, A. Y., and Suciu, D. 1998. Query containment for conjunctive queries with regular expressions. In Proceedings of the Symposium on Principles of Database Systems (PODS'98). ACM Press, New York, 139--148.
    [27]
    Gao, S., Sperberg-Mcqueen, C. M., Thompson, H. S., Mendelsohn, N., Beech, D., and Maloney, M. 2009. W3C XML schema definition language (XSD) 1.1 part 1: Structures. Tech. rep., World Wide Web Consortium. http://www.w3.org/TR/2009/CR-xmlschema11-1-20090430/.
    [28]
    Gelade, W., Gyssens, M., and Martens, W. 2012. Regular expressions with counting: Weak versus strong determinism. SIAM J. Comput. 41, 1, 160--190.
    [29]
    Gelade, W., Martens, W., and Neven, F. 2009. Optimizing schema languages for XML: Numerical constraints and interleaving. SIAM J. Comput. 38, 5, 2021--2043.
    [30]
    Glushkov, V. M. 1961. The abstract theory of automata. Russian Math. Surv. 16, 5, 1--53.
    [31]
    Harris, S. and Seaborne, A. 2010. SPARQL 1.1 query language. Tech. rep., World Wide Web Consortium (W3C). http://www.w3.org/TR/2010/WD-sparql11-query-20100601/.
    [32]
    Harris, S. and Seaborne, A. 2012. SPARQL 1.1 query language. Tech. rep.,World Wide Web Consortium (W3C). http://www.w3.org/TR/2012/WD-sparql11-query-20120105/.
    [33]
    Hopcroft, J. E. and Ullman, J. D. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Boston, MA.
    [34]
    Kannan, S., Sweedyk, Z., and Mahaney, S. R. 1995. Counting and random generation of strings in regular languages. In Proceedings of the Symposium on Discrete Algorithms (SODA'95). SIAMS, 551--557.
    [35]
    Kilpelainen, P. and Tuhkanen, R. 2003. Regular expressions with numerical occurrence indicators- Preliminary results. In Proceedings of the Symposium on Programming Languages and Software Tools (SPLST'03). 63--173.
    [36]
    Kilpelainen, P. and Tuhkanen, R. 2007. One-unambiguity of regular expressions with numeric occurrence indicators. Inf. Comput. 205, 6, 890--916.
    [37]
    Kleene, S. C. 1956. Representations of events in nerve sets and finite automata. In Automata Studies, Princeton University Press, Princeton, NJ, 3--42.
    [38]
    Lapaugh, A. S. and Papadimitriou, C. H. 1984. The even-path problem for graphs and digraphs. Netw. 14, 4, 507-513. http://onlinelibrary.wiley.com/doi/10.1002/net.3230140403/abstract.
    [39]
    Libkin, L., Martens, W., and Vrgoc, D. 2013. Querying graph databases with Xpath. In Proceedings of the International Conference on Database Theory (ICDT'13). ACM Press, New York, 129--140.
    [40]
    Libkin, L. and Vrgoc, D. 2012. Regular path queries on graphs with data. In Proceedings of the International Conference on Database Theory (ICDT'12). ACM Press, New York, 74--85.
    [41]
    Liu, Y. A. and Yu, F. 2002. Solving regular path queries. In Proceedings of the 6th International on Conference on Mathematics of Program Construction (MPC'02). Springer, 195--208.
    [42]
    Losemann, K. and Martens, W. 2012. The complexity of evaluating path expressions in sparql. In Proceedings of the Symposium on Principles of Database Systems (PODS'12). ACM Press, New York, 101--112.
    [43]
    Martens, W., Neven, F., and Schwentick, T. 2004. Complexity of decision problems for simple regular expressions. In Proceedings of the 29th International Symposium on Mathematical Foundations of Computer Science (MFCS'04). Springer, 889--900.
    [44]
    Martens, W., Neven, F., and Schwentick, T. 2009. Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39, 4, 1486--1530.
    [45]
    Mendelzon, A. O. and Wood, P. T. 1995. Finding regular simple paths in graph databases. SIAM J. Comput. 24, 6, 1235--1258.
    [46]
    Perez, J., Arenas, M., and Gutierrez, C. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3, 16:1--16:45.
    [47]
    Perez, J., Arenas, M., and Gutierrez, C. 2010. nSPARQL: A navigational language for RDF. J. Web Semantics 8, 4, 255--270.
    [48]
    Ramakrishnan, R. and Gehrke, J. 2003. Database Management Systems 3rd Ed. McGraw-Hill, New York.
    [49]
    Schmidt, M., Meier, M., and Lausen, G. 2010. Foundations of sparql query optimization. In Proceedings of the International Conference on Database Theory (ICDT'10). ACM Press, New York, 4--33.
    [50]
    Stockmeyer, L. 1974. The complexity of decision problems in automata theory and logic. Ph.D. dissertation, Massachusetts Institute of Technology. http://people.csail.mit.edu/meyer/Stockmeyer-thesis.pdf.
    [51]
    Valiant, L. G. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3, 410--421.
    [52]
    Yannakakis, M. 1990. Graph-theoretic methods in database theory. In Proceedings of the Symposium on Principles of Database Systems (PODS'90). ACM Press, New York, 230--242.

    Cited By

    View all
    • (2024)Path Querying in Graph Databases: A Systematic Mapping StudyIEEE Access10.1109/ACCESS.2024.337197612(33154-33172)Online publication date: 2024
    • (2023)Representing Paths in Graph Database Pattern MatchingProceedings of the VLDB Endowment10.14778/3587136.358715116:7(1790-1803)Online publication date: 8-May-2023
    • (2023)Conjunctive Regular Path Queries under Injective SemanticsProceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3584372.3588664(231-240)Online publication date: 18-Jun-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 38, Issue 4
    Invited papers issue
    November 2013
    294 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/2539032
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 December 2013
    Accepted: 01 May 2013
    Received: 01 September 2012
    Published in TODS Volume 38, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Graph databases
    2. query evaluation
    3. regular expressions

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)4
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Path Querying in Graph Databases: A Systematic Mapping StudyIEEE Access10.1109/ACCESS.2024.337197612(33154-33172)Online publication date: 2024
    • (2023)Representing Paths in Graph Database Pattern MatchingProceedings of the VLDB Endowment10.14778/3587136.358715116:7(1790-1803)Online publication date: 8-May-2023
    • (2023)Conjunctive Regular Path Queries under Injective SemanticsProceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3584372.3588664(231-240)Online publication date: 18-Jun-2023
    • (2023)Transformer Based Semantic Relation Typing for Knowledge Graph IntegrationThe Semantic Web10.1007/978-3-031-33455-9_7(105-121)Online publication date: 28-May-2023
    • (2023)Refining Large Integrated Identity Graphs Using the Unique Name AssumptionThe Semantic Web10.1007/978-3-031-33455-9_4(55-71)Online publication date: 28-May-2023
    • (2023)Join Ordering of SPARQL Property Path QueriesThe Semantic Web10.1007/978-3-031-33455-9_3(38-54)Online publication date: 28-May-2023
    • (2023)Boosting Knowledge Graph Generation from Tabular Data with RML ViewsThe Semantic Web10.1007/978-3-031-33455-9_29(484-501)Online publication date: 28-May-2023
    • (2023)IMKG: The Internet Meme Knowledge GraphThe Semantic Web10.1007/978-3-031-33455-9_21(354-371)Online publication date: 28-May-2023
    • (2023)Activity Recommendation for Business Process Modeling with Pre-trained Language ModelsThe Semantic Web10.1007/978-3-031-33455-9_19(316-334)Online publication date: 28-May-2023
    • (2023)Subsumption Prediction for E-Commerce TaxonomiesThe Semantic Web10.1007/978-3-031-33455-9_15(244-261)Online publication date: 28-May-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media