Abstract
We study the containment, satisfiability, and validity problems for conjunctive queries over trees with respect to a schema. We show that conjunctive query containment and validity are 2EXPTIME -complete with respect to a schema, in both cases where the schema is given as a DTD or as a tree automaton. Furthermore, we show that satisfiability for conjunctive queries with respect to a schema can be decided in NP . The problem is NP -hard already for queries using only one kind of axis. Finally, we consider conjunctive queries that can test for equalities and inequalities of data values. Here, satisfiability and validity are decidable, but containment is undecidable, even without schema information. On the other hand, containment with respect to a schema becomes decidable again if the “larger” query is not allowed to use both equalities and inequalities.
Similar content being viewed by others
Notes
We do not require CQs to be in prenex normal form. However, all formulas that we construct in the paper can be put in prenex normal form by simply renaming the variables and moving the quantifiers.
Notice that, as stated in the introduction, we assume that trees only take labels from a finite alphabet \(\varSigma \). Hence, for a conjunctive query Q, the set L(Q) also consists of trees over alphabet \(\varSigma \). In the rare cases where we consider trees without schema information, we state this explicitly.
Transforming an NTA to a reduced NTA can be done in polynomial time by first performing an emptiness test for every state of A, followed by a reachability test. Section 4.2 of [35] describes an algorithm for reducing a DTD. The algorithm for NTAs is analogous.
To the best of our knowledge, the full proof is unpublished. For the convenience of our readers, we provide Wood’s proof, which he kindly provided in a personal communication.
We assume \(\varDelta \) to contain all the data values we use in our proofs and examples.
This definition is done with the proof of Theorem 23 in the back of our minds and therefore more complicated than a reader might have expected. In this proof, the reader should think of \(x'\) and \(y'\) as being mapped to the same node.
Of course, the resulting equality atoms can be removed by suitable variable renaming.
For the purpose of the reduction, a node v of the query is a leaf node if and only if the query does not have any atom of the form \(\textit{Child}\,(v,w)\) or \(\textit{Child}\,^+(v,w)\).
We can assume w.l.o.g. that the free variables are the same in P and Q.
Here, structural constraints include node identities and VBCs allow comparison of data values to constants.
References
Abiteboul, S., Bourhis, P., Muscholl, A., Wu, Z.: Recursive queries on trees and data trees. In: International Conference on Database Theory (ICDT), pp. 93–104 (2013)
Arenas, M., Barceló, P., Libkin, L., Murlak, F.: Foundations of Data Exchange. Cambridge University Press, Cambridge (2014)
Barceló, P., Libkin, L., Poggi, A., Sirangelo, C.: XML with incomplete information. J. ACM 58(1), 4 (2010)
Benedikt, M., Bourhis, P., Senellart, P.: Monadic datalog containment. In: International Colloquium on Automata, Languages, and Programming (ICALP), pp. 79–91 (2012)
Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM 55(2), Art. no. 8 (2008). doi:10.1145/1346330.1346333
Berglund, A., Boag, S., Chamberlin, D., Fernández, M.F., Kay, M., Robie, J., Siméon, J.: XML path language (XPath) 2.0. Technical report, World Wide Web Consortium (2007). http://www.w3.org/TR/xpath20/
Björklund, H., Martens, W., Schwentick, T.: Conjunctive query containment over trees. J. Comput. Syst. Sci. 77(3), 450–472 (2011)
Björklund, H., Martens, W., Schwentick, T.: Validity of tree pattern queries with respect to schema information. In: Mathematical Foundations of Computer Science (MFCS), pp. 171–182 (2013)
Bojanczyk, M., Kolodziejczyk, L.A., Murlak, F.: Solutions in XML data exchange. J. Comput. Syst. Sci. 79(6), 785–815 (2013)
Bojanczyk, M., Murlak, F., Witkowski, A.: Containment of monadic datalog programs via bounded clique-width. In: International Colloquium on Automata, Languages, and Programming (ICALP), pp. 427–439 (2015)
Bojanczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees and XML reasoning. J. ACM 56(3), Art. no.13 (2009). doi:10.1145/1516512.1516515
Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 142(2), 182–206 (1998)
Chandra, A.K., Kozen, D.C., Stockmeyer, L.J.: Alternation. J. ACM 28(1), 114–133 (1981)
Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: STOC, pp. 77–90 (1977)
Chlebus, B.S.: Domino-tiling games. J. Comput. Syst. Sci. 32(3), 374–392 (1986)
Clark, J., Murata, M.: Relax NG specification (2001). http://www.relaxng.org/spec-20011203.html
Czerwinski, W., David, C., Losemann, K., Martens, W.: Deciding definability by deterministic regular expressions. In: International Conference on Foundations of Software Science and Computation Structures (FOSSACS), pp 289–304. Springer, Berlin (2013)
Czerwinski, W., Martens, W., Niewerth, M., Parys, P.: Minimization of tree pattern queries. In: Symposium on Principles of Database Systems (PODS), pp. 43–54 (2016)
Czerwinski, W., Martens, W., Parys, P., Przybylko, M.: The (almost) complete guide to tree pattern containment. In: Symposium on Principles of Database Systems (PODS), pp. 117–130 (2015)
David, C.: Complexity of data tree patterns over XML documents. In: MFCS, pp. 278–289 (2008)
David, C., Gheerbrant, A., Libkin, L., Martens, W.: Containment of pattern-based queries over data trees. In: International Conference on Database Theory (ICDT), pp. 201–212 (2013)
David, C., Hofman, P., Murlak, F., Pilipczuk, M.: Synthesizing transformations from XML schema mappings. In: International Conference on Database Theory (ICDT), pp. 61–71 (2014)
David, C, Libkin, L., Murlak, F.: Certain answers for XML queries. In: Symposium on Principles of Database Systems (PODS), pp. 191–202 (2010)
Flum, Jörg, Frick, Markus, Grohe, Martin: Query evaluation via tree-decompositions. J. ACM 49(6), 716–752 (2002)
Gallant, J., Maier, D., Storer, J.A.: On finding minimal length superstrings. J. Comput. Syst. Sci. 20(1), 50–58 (1980)
Geerts, F., Fan, W.: Satisfiability of XPath queries with sibling axes. In: DBPL, pp. 122–137 (2005)
Gheerbrant, A., Libkin, L., Tan, T.: On the complexity of query answering over incomplete XML documents. In: ICDT, pp. 169–181 (2012)
Gottlob, G., Koch, C., Schulz, K.U.: Conjunctive queries over trees. J. ACM 53(2), 238–272 (2006)
Hidders, J.: Satisfiability of XPath expressions. In: DBPL, pp. 21–36 (2003)
Kimelfeld, B., Sagiv, Y.: Revisiting redundancy and minimization in an XPath fragment. In: Extending Database Technology (EDBT), pp. 61–72 (2008)
Kolaitis, P.G., Vardi, M.Y.: Conjunctive-query containment and constraint satisfaction. J. Comput. Syst. Sci. 61(2), 302–332 (2000)
Lakshmanan, L.V.S., Ramesh, G., Wang, H., Zhao, Z.: On testing satisfiability of tree pattern queries. In: VLDB, pp. 120–131 (2004)
Lu, P., Bremer, J., Chen, H.: Deciding determinism of regular languages. Theory Comput. Syst. 57(1), 97–139 (2015). doi:10.1007/s00224-014-9576-2
Martens, W., Neven, F.: On the complexity of typechecking top-down XML transformations. Theor. Comput. Sci. 336(1), 153–180 (2005)
Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)
Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Syst. 31(3), 770–813 (2006)
Marx, M.: Conditional XPath. ACM TODS 30(4), 929–959 (2005)
Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)
Murlak, F., Oginski, M., Przybylko, M.: Between tree patterns and conjunctive queries: Is there tractability beyond acyclicity? In: Mathematical Foundations of Computer Science (MFCS), pp. 705–717 (2012)
Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Log. Methods Comput. Sci. 2(3), Art. no. 1 (2006). doi:10.2168/LMCS-2(3:1)2006
Post, E.L.: A variant of a recursively unsolvable problem. Bull. AMS 52(4), 264–268 (1946)
Räihä, K.J., Ukkonen, E.: The shortest common supersequence problem over binary alphabet is NP-complete. Theor. Comput. Sci. 16(2), 187–198 (1981)
Takahashi, M.: Generalizations of regular sets and their application to a study of context-free languages. Inf. Control 27(1), 1–36 (1975)
ten Cate, B., Lutz, C.: The complexity of query containment in expressive fragments of XPath 2. J. ACM 56(6), Art. no. 31 (2009). doi:10.1145/1568318.1568321
Thatcher, James W., Wright, Jesse B.: Generalized finite automata theory with an application to a decision problem of second-order logic. Math. Syst. Theory 2(1), 57–81 (1968)
Vardi, Moshe Y.: Reasoning about the past with two-way automata. In: Proceedings of the 25th International Colloquium on Automata, Languages and Programming (ICALP’98), Aalborg, Denmark, July 13–17, 1998, pp. 628–641 (1998)
Wood, P.T.: Containment for XPath fragments under DTD constraints. In: ICDT, 2003. Full version, obtained through personal communication (2003)
Acknowledgments
This work was supported by grant number MA 4938/2-1 from the Deutsche Forschungsgemeinschaft (Emmy Noether Nachwuchsgruppe) and the Swedish Research Council Grant 621-2011-6080.
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this work was presented at the 33rd International Symposium on Mathematical Foundations of Computer Science.
Rights and permissions
About this article
Cite this article
Björklund, H., Martens, W. & Schwentick, T. Conjunctive query containment over trees using schema information. Acta Informatica 55, 17–56 (2018). https://doi.org/10.1007/s00236-016-0282-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00236-016-0282-1