Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Efficient reasoning about data trees via integer linear programming

Published: 06 September 2012 Publication History

Abstract

Data trees provide a standard abstraction of XML documents with data values: they are trees whose nodes, in addition to the usual labels, can carry labels from an infinite alphabet (data). Therefore, one is interested in decidable formalisms for reasoning about data trees. While some are known—such as the two-variable logic—they tend to be of very high complexity, and most decidability proofs are highly nontrivial. We are therefore interested in reasonable complexity formalisms as well as better techniques for proving decidability.
Here we show that many decidable formalisms for data trees are subsumed—fully or partially—by the power of tree automata together with set constraints and linear constraints on cardinalities of various sets of data values. All these constraints can be translated into instances of integer linear programming, giving us an NP upper bound on the complexity of the reasoning tasks. We prove that this bound, as well as the key encoding technique, remain very robust, and allow the addition of features such as counting of paths and patterns, and even a concise encoding of constraints, without increasing the complexity. The NP bound is tight, as we also show that the satisfiability of a single set constraint is already NP-hard.
We then relate our results to several reasoning tasks over XML documents, such as satisfiability of schemas and data dependencies and satisfiability of the two-variable logic. As a final contribution, we describe experimental results based on the implementation of some reasoning tasks using the SMT solver Z3.

References

[1]
Alon, N., Milo, T., Neven, F., Suciu, D., and Vianu, V. 2003. XML with data values: Typechecking revisited. J. Comput. Syst. Sci. 66, 4, 688--722.
[2]
Arenas, M., Fan, W., and Libkin, L. 2008. On the complexity of verifying consistency of XML specifications. SIAM J. Comput. 38, 3, 841--880.
[3]
Arenas, M. and Libkin, L. 2008. XML data exchange: consistency and query answering. J. ACM 55, 2.
[4]
Björklund, H., Martens, W., and Schwentick, T. 2008. Optimizing conjunctive queries over trees using schema information. In Mathematical Foundations of Computer Science. Springer, 132--143.
[5]
Bojanczyk, M., David, C., Muscholl, A., Schwentick, T., and Segoufin, L. 2011. Two-variable logic on data words. ACM Trans. Comput. Logic 12, 4.
[6]
Bojanczyk, M., Muscholl, A., Schwentick, T., and Segoufin, L. 2009. Two-Variable logic on data trees and XML reasoning. J. ACM 56, 3.
[7]
Bouyer, P., Petit, A., and Thérien, D. 2001. An algebraic characterization of data and timed languages. In CONCUR. Springer, 248--261.
[8]
Buneman, P., Davidson, S., Fan, W., Hara, C., and Tan, W.-C. 2002. Keys for XML. Comput. Netw. 39, 5.
[9]
Calvanese, D., Giacomo, G. D., Lenzerini, M., and Vardi, M. 2009. An automata-theoretic approach to regular XPath. In Database Programming Languages. 18--35.
[10]
Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Löding, C., Lugiez, D., Tison, S., and Tommasi, M. 2007. Tree Automata: Techniques and Applications. http://www.grappa.univ-lille3.fr/tata
[11]
David, C., Libkin, L., and Tan, T. 2011. Efficient reasoning about data trees via integer linear programming. In International Conference on Database Theory.
[12]
de Moura, L. M. and Bjørner, N. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems. 337--340.
[13]
Demri, S. and Lazic, R. 2009. Ltl with the freeze quantifier and register automata. ACM Trans. Comput. Logic 10, 3.
[14]
Fan, W. and Libkin, L. 2002. On XML integrity constraints in the presence of dtds. J. ACM 49, 3.
[15]
Figueira, D. 2009. Satisfiability of downward xpath with data equality tests. In Proceedings of the Symposium on Principles of Database Systems. ACM, 197--206.
[16]
Genevés, P. and Layaida, N. 2006. A system for the static analysis of XPath. ACM Trans. Inf. Syst. 24, 4, 475--502.
[17]
Givan, R., McAllester, D. A., Witty, C., and Kozen, D. 2002. Tarskian set constraints. Inf. Comput. 174, 105--131.
[18]
Jurdzinski, M. and Lazic, R. 2007. Alternation-Free modal mu-calculus for data trees. In Proceedings of the Symposium on Logic in Computer Science. IEEE Computer Society, 131--140.
[19]
Kaminski, M. and Tan, T. 2008. Tree automata over infinite alphabets. In Pillars of Computer Science. Springer, 386--423.
[20]
Kopczynski, E. and To, A. W. 2010. Parikh images of grammars: Complexity and applications. In Proceedings of the Symposium on Logic in Computer Science. IEEE Computer Society.
[21]
Libkin, L. and Sirangelo, C. 2010. Reasoning about XML with temporal logics and automata. J. Appl. Logic 8, 2, 210--232.
[22]
Malik, S. and Zhang, L. 2009. Boolean satisfiability: From theoretical hardness to practical success. Comm. ACM 52, 8, 76--82.
[23]
Martens, W., Neven, F., and Schwentick, T. 2007. Simple off the shelf abstractions for XML schema. SIGMOD Rec. 36, 3, 15--22.
[24]
Marx, M. 2005. Conditional XPath. ACM Trans. Database Syst. 30, 4, 929--959.
[25]
Milo, T., Suciu, D., and Vianu, V. 2003. Typechecking for XML transformers. J. Comput. Syst. Sci. 66, 1, 66--97.
[26]
Neven, F. 2002. Automata, logic, and XML. In Computer Science Logic. Springer, 2--26.
[27]
Neven, F. and Schwentick, T. 2002. Query automata over finite trees. Theor. Comput. Sci. 275, 1-2, 633--674.
[28]
Neven, F., Schwentick, T., and Vianu, V. 2004. Finite state machines for strings over infinite alphabets. ACM Trans. Comput. Logic 5, 3, 403--435.
[29]
Pacholski, L. and Podelski, A. 1997. Set constraints: A pearl in research on constraints. In Principles and Practice of Constraint Programming. 549--562.
[30]
Papadimitriou, C. 1981. On the complexity of integer programming. J. ACM 28, 765--768.
[31]
Robinson, A. and Voronkov, A. 2001. Handbook of Automated Reasoning. MIT Press.
[32]
Schwentick, T. 2004. XPath query containment. SIGMOD Rec. 33, 1, 101--109.
[33]
Thatcher, J. 1967. Characterizing derivation trees of context-free grammars through a generalization of finite automata theory. J. Comput. Syst. Sci. 1, 317--322.
[34]
Verma, K., Seidl, H., and Schwentick, T. 2005. On the complexity of equational horn clauses. In Conference on Automated Deduction. Springer, 337--352.
[35]
Vianu, V. 2001. A Web odyssey: From Codd to XML. In Proceedings of the Symposium on Principles of Database Systems. ACM, 1--15.
[36]
West, D. 2001. Introduction to Graph Theory. Prentice Hall.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 37, Issue 3
August 2012
191 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2338626
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 September 2012
Accepted: 01 June 2012
Revised: 01 April 2012
Received: 01 October 2011
Published in TODS Volume 37, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Presburger arithmetic
  2. XML
  3. data values
  4. integer linear programming
  5. reasoning
  6. tree languages

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Parikh’s Theorem Made SymbolicProceedings of the ACM on Programming Languages10.1145/36329078:POPL(1945-1977)Online publication date: 5-Jan-2024
  • (2019)CSS Minification via Constraint SolvingACM Transactions on Programming Languages and Systems10.1145/331033741:2(1-76)Online publication date: 19-Jun-2019
  • (2018)Satisfiability of Xpath on data treesACM SIGLOG News10.1145/3212019.32120215:2(4-16)Online publication date: 30-Apr-2018
  • (2014)Extending two-variable logic on data trees with order on data values and its automataACM Transactions on Computational Logic10.1145/255994515:1(1-39)Online publication date: 6-Mar-2014
  • (2014)Forward and backward application of symbolic tree transducersActa Informatica10.1007/s00236-014-0197-751:5(297-325)Online publication date: 1-Aug-2014
  • (2013)On XPath with transitive axes and data testsProceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems10.1145/2463664.2463675(249-260)Online publication date: 22-Jun-2013
  • (2013)Static Analysis and Query Answering for Incomplete Data Trees with ConstraintsIn Search of Elegance in the Theory and Practice of Computation10.1007/978-3-642-41660-6_15(273-290)Online publication date: 2013

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media